Hi folks, Here's my pig script:
* a = load 'pig/input' as (x:int, y:chararray);* * b = load 'pig/input1' as (x:int, y:chararray);* * c = group a by x;* * d = foreach c generate group as x, COUNT($1) as cnt;* * d = join d by x, b by x;* * store d into 'pig/output';* I use tez as the execution engine and notice that pig would convert it to one dag with 4 vertices as following. But I think 3 vertices should be sufficient. Because the group by and join are using the same key So I think vertex (scop_39) is not necessary, we don't need to repartition the data again. The only impact on converting 4 vertices to 3 vertices may be on the parallelism of vertex (scope_41). Not sure how much the performance difference between these 2 methods, but think this could be a potential optimization. [image: Inline image 1] -- Best Regards Jeff Zhang
