----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16860/#review31947 -----------------------------------------------------------
Ship it! Looks right to me. Feel free to commit. - Daniel Dai On Jan. 14, 2014, 7:55 p.m., Cheolsoo Park wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/16860/ > ----------------------------------------------------------- > > (Updated Jan. 14, 2014, 7:55 p.m.) > > > Review request for pig, Alex Bain, Daniel Dai, Mark Wagner, and Rohini > Palaniswamy. > > > Bugs: PIG-3644 > https://issues.apache.org/jira/browse/PIG-3644 > > > Repository: pig-git > > > Description > ------- > > Skewed join in Tez is implemented in 5 vertices: > Vertex 1) Sample/load skewed table => broadcast sampling input to vertex 2 > and shuffle entire input to vertex 3. > Vertex 2) Sampling aggregation vertex => build distribution map and broadcast > it to vertex 3 and 4. > Vertex 3) POLocalRearrangeTez for skewed table => partition skewed table > using SkewedPartitioner and shuffle it to vertex 5. > Vertex 4) POPartitionRearrangeTez for streaming table => shuffle streaming > table to vertex 5. > Vertex 5) Join inputs from vertex 3 and 4. > > New classes for Tez: > - POPoissonSample) Sampling operator for skewed join. > - POPartitionRearrangeTez) Sub-class of POPartitionRearrange for Tez. > - SkewedPartitionerTez) Sub-class of SkewedPartitioner for Tez. > > Note that there are a couple of places I can refactor. For eg, > - POPoissonSample and PoissonSampleLoader > - POPartitionRearrageTez and POLocalRearrangeTez > > I will do it in follow-up jiras. > > > Diffs > ----- > > src/org/apache/pig/PigConfiguration.java ccf3635 > > src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/partitioners/SkewedPartitioner.java > 4790abe > > src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPoissonSample.java > e69de29 > > src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POReservoirSample.java > bcb339c > > src/org/apache/pig/backend/hadoop/executionengine/tez/POLocalRearrangeTez.java > 585509d > > src/org/apache/pig/backend/hadoop/executionengine/tez/POPartitionRearrangeTez.java > e69de29 > src/org/apache/pig/backend/hadoop/executionengine/tez/POShuffleTezLoad.java > e9d8e64 > src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java > e22c319 > > src/org/apache/pig/backend/hadoop/executionengine/tez/SkewedPartitionerTez.java > e69de29 > src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java > d35e87d > src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java > 83e5d2c > src/org/apache/pig/backend/hadoop/executionengine/tez/TezOperator.java > 93e522f > > src/org/apache/pig/backend/hadoop/executionengine/tez/WeightedRangePartitionerTez.java > 7bcc79e > src/org/apache/pig/impl/builtin/PartitionSkewedKeys.java 7ce0e82 > src/org/apache/pig/impl/builtin/PoissonSampleLoader.java 5ce5b9e > test/e2e/pig/tests/tez.conf ac254e5 > > Diff: https://reviews.apache.org/r/16860/diff/ > > > Testing > ------- > > - Added e2e test cases for inner and outer skewed joins. > - unit tests pass. > - e2e tests pass. > > > Thanks, > > Cheolsoo Park > >
