----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16309/#review30644 -----------------------------------------------------------
Looks good. Can you add a few e2e tests to it? - Daniel Dai On Dec. 18, 2013, 3:53 a.m., Alex Bain wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/16309/ > ----------------------------------------------------------- > > (Updated Dec. 18, 2013, 3:53 a.m.) > > > Review request for pig, Cheolsoo Park, Daniel Dai, Mark Wagner, and Rohini > Palaniswamy. > > > Bugs: PIG-3629 > https://issues.apache.org/jira/browse/PIG-3629 > > > Repository: pig-git > > > Description > ------- > > Implement STREAM operator in Tez - > https://issues.apache.org/jira/browse/PIG-3629 > > In this patch, I do not add resources to pig-misc.jar, I just add them > individually. See my discussion post: > https://groups.google.com/forum/#!topic/pig-on-tez/8S80GMKhMaU > > Basic Changes: > -Run the PhyPlanSetter and EndOfAllInputSetter to set the parent plan and the > end-of-all input flags necessary for STREAM, just like in MR Pig. > -Add a map to hold plan-specific extra local resources in TezOperPlan.java. > These resources can either come from the user's directory (e.g. > SHIP('/home/abain/foo')) or from HDFS (e.g. CACHE('/user/abain/bar') in HDFS). > -Add the new class TezPOStreamVisitor that assembles all the plan-specific > local resources that get added in TezOperPlan.java. > > Resource Manager Changes: > -TezResourcManager resources were previously a map of java.net.URL -> Path in > HDFS. Previously, the URL's were all local files, e.g. > file://home/abain/pig-withouthHadoop.jar. However, the CACHE statement > requires that resources already present in HDFS be able to be added as local > resources. Unfortunately java.net.URL does not support hdfs:// URL's, so I > changed this data structure to be a YARN URL instead. I also added methods to > the ResourceManager to distinguish whether you are adding a local resource or > a resource already present in HDFS. > -CACHE also supports URL's with fragments at the end, which become a > "shortcut" to the name, e.g. CACHE(/input/big-data-name.gz#data.gz). I > changed the resource manager to look for a fragments and use that as the > resource name (if the fragment exist). This results in the symlink to the > resource being created with the fragment name, which is what we want. > > > Diffs > ----- > > > src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/plans/EndOfAllInputSetter.java > 37566ab > src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java > 7a1736a > src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java > 2584501 > src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java > 96ccdde > > src/org/apache/pig/backend/hadoop/executionengine/tez/TezJobControlCompiler.java > 135b933 > src/org/apache/pig/backend/hadoop/executionengine/tez/TezOperPlan.java > 0cc8e17 > > src/org/apache/pig/backend/hadoop/executionengine/tez/TezPOStreamVisitor.java > PRE-CREATION > src/org/apache/pig/backend/hadoop/executionengine/tez/TezPlanContainer.java > 673fd70 > > src/org/apache/pig/backend/hadoop/executionengine/tez/TezResourceManager.java > 0fd7575 > > Diff: https://reviews.apache.org/r/16309/diff/ > > > Testing > ------- > > Added a unit test to TestTezCompiler.java > Added an e2e test to tez.conf > > ant test-tez passes > ant test-e2e-tez has three failures - I am investigating to see if they are > releated, or perhaps just transient > > Question: There is already a separate suite of STREAM tests in > streaming.conf. Maybe I should remove my e2e test and we should add > streaming.conf as a dependency to the test-e2e-tez target? I haven't tried to > run streaming.conf yet. > > > Thanks, > > Alex Bain > >
