-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16309/
-----------------------------------------------------------

(Updated Dec. 23, 2013, 5:34 p.m.)


Review request for pig, Cheolsoo Park, Daniel Dai, Mark Wagner, and Rohini 
Palaniswamy.


Changes
-------

Updated with fix to race condition and e2e tests


Bugs: PIG-3629
    https://issues.apache.org/jira/browse/PIG-3629


Repository: pig-git


Description (updated)
-------

Implement STREAM operator in Tez - 
https://issues.apache.org/jira/browse/PIG-3629

In this patch, I do not add resources to pig-misc.jar, I just add them 
individually. See my discussion post: 
https://groups.google.com/forum/#!topic/pig-on-tez/8S80GMKhMaU

Basic Changes:
-Run the PhyPlanSetter and EndOfAllInputSetter to set the parent plan and the 
end-of-all input flags necessary for STREAM, just like in MR Pig.
-Add a map to hold plan-specific extra local resources in TezOperPlan.java. 
These resources can either come from the user's directory (e.g. 
SHIP('/home/abain/foo')) or from HDFS (e.g. CACHE('/user/abain/bar') in HDFS).
-Add the new class TezPOStreamVisitor that assembles all the plan-specific 
local resources that get added in TezOperPlan.java.

Resource Manager Changes:
-TezResourcManager resources were previously a map of java.net.URL -> Path in 
HDFS. Previously, the URL's were all local files, e.g. 
file://home/abain/pig-withouthHadoop.jar. However, the CACHE statement requires 
that resources already present in HDFS be able to be added as local resources. 
Unfortunately java.net.URL does not support hdfs:// URL's, so I changed this 
data structure to be a YARN URL instead. I also added methods to the 
ResourceManager to distinguish whether you are adding a local resource or a 
resource already present in HDFS.
-CACHE also supports URL's with fragments at the end, which become a "shortcut" 
to the name, e.g. CACHE(/input/big-data-name.gz#data.gz). I changed the 
resource manager to look for a fragments and use that as the resource name (if 
the fragment exist). This results in the symlink to the resource being created 
with the fragment name, which is what we want.

Race condition:
-I found a race condition that resulted from reusing the Result object in 
POSimpleTezLoad. There are several possible solutions. After discussing in the 
newsgroup, we decided to change POSimpleTezLoad for now.
-I also made a small cleanup to PhysicalOperator.java.


Diffs (updated)
-----

  
src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/plans/EndOfAllInputSetter.java
 37566ab 
  
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java
 8487a0f 
  src/org/apache/pig/backend/hadoop/executionengine/tez/POSimpleTezLoad.java 
d57aded 
  src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java 
0ee7256 
  src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java 
191563d 
  src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java 
df9fea6 
  
src/org/apache/pig/backend/hadoop/executionengine/tez/TezJobControlCompiler.java
 135b933 
  src/org/apache/pig/backend/hadoop/executionengine/tez/TezOperPlan.java 
0cc8e17 
  src/org/apache/pig/backend/hadoop/executionengine/tez/TezPOStreamVisitor.java 
PRE-CREATION 
  src/org/apache/pig/backend/hadoop/executionengine/tez/TezPlanContainer.java 
673fd70 
  src/org/apache/pig/backend/hadoop/executionengine/tez/TezResourceManager.java 
0fd7575 
  src/org/apache/pig/backend/hadoop/streaming/HadoopExecutableManager.java 
862e637 
  test/e2e/pig/tests/tez.conf 0e4ba4e 
  test/org/apache/pig/test/data/GoldenFiles/TEZC12.gld PRE-CREATION 
  test/org/apache/pig/tez/TestTezCompiler.java 8d5e5f2 

Diff: https://reviews.apache.org/r/16309/diff/


Testing (updated)
-------

Added a unit test to TestTezCompiler.java
Added a new unit test e2e test to tez.conf with session reuse enabled
Ported three other e2e tests from streaming.conf to tez.conf to increase 
coverage

ant test-tez passes
ant test-e2e-tez passes
Manually tested with a large subset of tests from streaming.conf (the ones 
using features currently supported by Pig-on-Tez), they pass


Thanks,

Alex Bain

Reply via email to