> On Feb. 17, 2014, 7:54 a.m., Cheolsoo Park wrote: > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java, > > line 1464 > > <https://reviews.apache.org/r/18181/diff/2/?file=490429#file490429line1464> > > > > Remove this comment since it's no longer applicable?
Left that on purpose. We want to try unsorted shuffle to reduce the number of stages if data is less. For eg: If there are 7K input splits and parallel set to 100, with 1-1 it will be 7K tasks in load vertex, 7K tasks in partition vertex and 100 in join vertex. We want to see if 7K in load vertex, 3.5K in partition vertex and 100 in join vertex performs better. In theory it might be better as the final join task only needs to merge 3.5K map outputs instead of 7K. But if that does not work out then we will stick with 1-1. - Rohini ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18181/#review34628 ----------------------------------------------------------- On Feb. 17, 2014, 7:34 a.m., Rohini Palaniswamy wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/18181/ > ----------------------------------------------------------- > > (Updated Feb. 17, 2014, 7:34 a.m.) > > > Review request for pig, Cheolsoo Park and Daniel Dai. > > > Bugs: PIG-3766 > https://issues.apache.org/jira/browse/PIG-3766 > > > Repository: pig > > > Description > ------- > > Changes done: > 1) Removed the POLocalRearrange in SampleVertex and replaced it with a > POValueOutTez for both orderby and skewedjoin. POValueOutTez takes multiple > outputs. So got rid of the POSplit as well in skewed join sample vertex. > 2) Replaced the POPackage+POLocalRearrange in the partition vertex of left > table (vertex 3) with a POIdentityInOutTez moving the project in > POLocalRearrange into the POLocalRearrange in vertex 1. Also made the edge > 1-1 between vertex 1 and vertex 3. > > > Diffs > ----- > > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POIdentityInOutTez.java > 1568862 > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POLocalRearrangeTez.java > 1568862 > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java > 1568862 > > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java > 1568862 > > http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC16.gld > 1568862 > > http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC17.gld > PRE-CREATION > > http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC7.gld > 1568862 > > http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/tez/TestTezCompiler.java > 1568862 > > Diff: https://reviews.apache.org/r/18181/diff/ > > > Testing > ------- > > TestSkewedJoin and -t SkewedJoin in nightly.conf (except SkewedJoin_6 > PIG-3727) pass > > > Thanks, > > Rohini Palaniswamy > >
