-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15219/
-----------------------------------------------------------
Review request for pig, Cheolsoo Park, Daniel Dai, Mark Wagner, and Rohini
Palaniswamy.
Bugs: PIG-3536
https://issues.apache.org/jira/browse/PIG-3536
Repository: pig-git
Description
-------
Implement DISTINCT for Pig-on-Tez by providing a (very straightforward)
implementation in TezCompiler.java.
For the moment, this does NOT use two optimizations done in the MRCompiler. We
will create a separate JIRA for these optimizations:
1. A distinct combiner
2. A combiner optimizer that replaces certain uses of DISTINCT with an
algebraic udf
[Little code note: I changed the name of getPlainForEach to getForEachPlain.
That way we can have getForEachHelper1, getForEachHelper2, etc. all follow
alphabetically. Sorry if that's a little too OCD.]
Diffs
-----
src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java
d62b2a1
test/e2e/pig/tests/tez.conf 24af8d3
test/org/apache/pig/test/data/GoldenFiles/TEZC5.gld PRE-CREATION
test/org/apache/pig/tez/TestTezCompiler.java 1209d08
Diff: https://reviews.apache.org/r/15219/diff/
Testing
-------
This patch includes:
-A unit test in TestTezCompiler.java
-An e2e test
DANIEL: Can you check that my e2e test looks appropriate? I wasn't sure which
test data set to choose, I just picked studenttab20m.
Thanks,
Alex Bain