-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17375/
-----------------------------------------------------------

Review request for pig, Alex Bain, Daniel Dai, Mark Wagner, and Rohini 
Palaniswamy.


Bugs: PIG-3719
    https://issues.apache.org/jira/browse/PIG-3719


Repository: pig-git


Description
-------

This patch fixes two sets of skewed join e2e tests-
1) tez.conf Join_[7_8].
2) nightly.conf SkewedJoin_[1_10].

There are mainly two changes-
1) In POShuffleTezLoad, we copy PigNullableWritable using 
PigNullableWritable.newInstance(). But this methods doesn't copy the key of 
NullablePartitionWritable (subclass of PigNullableWritable for skewed join). 
This was causing NPE later.
2) In POPartitionRearrangeTez, the init() method builds reduceMap <key, 
pair<int, int>>. The problem was that the key of this map was always wrapped by 
tuple even though there is only a single join key, resulting in wrong join 
output. Now the key is wrapped only if there are more than one join keys.


Diffs
-----

  
src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/partitioners/SkewedPartitioner.java
 57af63f 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPartitionRearrange.java
 4ef55b8 
  
src/org/apache/pig/backend/hadoop/executionengine/tez/POLocalRearrangeTez.java 
13d9ec9 
  
src/org/apache/pig/backend/hadoop/executionengine/tez/POPartitionRearrangeTez.java
 c07e69e 
  src/org/apache/pig/backend/hadoop/executionengine/tez/POShuffleTezLoad.java 
761ae90 
  
src/org/apache/pig/backend/hadoop/executionengine/tez/SkewedPartitionerTez.java 
654f350 
  src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java 
197486f 
  src/org/apache/pig/backend/hadoop/executionengine/util/MapRedUtil.java 
637246c 
  test/e2e/pig/tests/nightly.conf e1a55e6 

Diff: https://reviews.apache.org/r/17375/diff/


Testing
-------

* ant clean test-tez passes except TestCustomPartitioner. I confirmed 
TestCustomPartitioner doesn't pass in current tez branch, so it's not related.
* All nightly.conf SkewedJoin tests pass except #6 self join case.
* All tez.conf e2e tests pass.

Note that for nightly.conf SkewedJoin #9 and #10, I changed the parallel from 8 
to 4. The reason is because for right/full outer join, it's important that 
every task in partition vertex is given at least one row of skewed keys. 
However, in e2e tests, we arbitrarily re-distributes keys to 8 reducers while 
keys are not skewed. As a result, some tasks are given no row of these fake 
"skewed" keys, and that makes the tests fail. So I am reducing the parallel to 
avoid this situation. For real skewed data, this shouldn't be an issue.

In addition, I'd like to fix the self join case (#6) in a separate jira because 
it requires quite a few changes in TezCompiler due to POSplit.


Thanks,

Cheolsoo Park

Reply via email to