-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16309/#review30644
-----------------------------------------------------------


Looks good. Can you add a few e2e tests to it?

- Daniel Dai


On Dec. 18, 2013, 3:53 a.m., Alex Bain wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16309/
> -----------------------------------------------------------
> 
> (Updated Dec. 18, 2013, 3:53 a.m.)
> 
> 
> Review request for pig, Cheolsoo Park, Daniel Dai, Mark Wagner, and Rohini 
> Palaniswamy.
> 
> 
> Bugs: PIG-3629
>     https://issues.apache.org/jira/browse/PIG-3629
> 
> 
> Repository: pig-git
> 
> 
> Description
> -------
> 
> Implement STREAM operator in Tez - 
> https://issues.apache.org/jira/browse/PIG-3629
> 
> In this patch, I do not add resources to pig-misc.jar, I just add them 
> individually. See my discussion post: 
> https://groups.google.com/forum/#!topic/pig-on-tez/8S80GMKhMaU
> 
> Basic Changes:
> -Run the PhyPlanSetter and EndOfAllInputSetter to set the parent plan and the 
> end-of-all input flags necessary for STREAM, just like in MR Pig.
> -Add a map to hold plan-specific extra local resources in TezOperPlan.java. 
> These resources can either come from the user's directory (e.g. 
> SHIP('/home/abain/foo')) or from HDFS (e.g. CACHE('/user/abain/bar') in HDFS).
> -Add the new class TezPOStreamVisitor that assembles all the plan-specific 
> local resources that get added in TezOperPlan.java.
> 
> Resource Manager Changes:
> -TezResourcManager resources were previously a map of java.net.URL -> Path in 
> HDFS. Previously, the URL's were all local files, e.g. 
> file://home/abain/pig-withouthHadoop.jar. However, the CACHE statement 
> requires that resources already present in HDFS be able to be added as local 
> resources. Unfortunately java.net.URL does not support hdfs:// URL's, so I 
> changed this data structure to be a YARN URL instead. I also added methods to 
> the ResourceManager to distinguish whether you are adding a local resource or 
> a resource already present in HDFS.
> -CACHE also supports URL's with fragments at the end, which become a 
> "shortcut" to the name, e.g. CACHE(/input/big-data-name.gz#data.gz). I 
> changed the resource manager to look for a fragments and use that as the 
> resource name (if the fragment exist). This results in the symlink to the 
> resource being created with the fragment name, which is what we want.
> 
> 
> Diffs
> -----
> 
>   
> src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/plans/EndOfAllInputSetter.java
>  37566ab 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java 
> 7a1736a 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java 
> 2584501 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java 
> 96ccdde 
>   
> src/org/apache/pig/backend/hadoop/executionengine/tez/TezJobControlCompiler.java
>  135b933 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezOperPlan.java 
> 0cc8e17 
>   
> src/org/apache/pig/backend/hadoop/executionengine/tez/TezPOStreamVisitor.java 
> PRE-CREATION 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezPlanContainer.java 
> 673fd70 
>   
> src/org/apache/pig/backend/hadoop/executionengine/tez/TezResourceManager.java 
> 0fd7575 
> 
> Diff: https://reviews.apache.org/r/16309/diff/
> 
> 
> Testing
> -------
> 
> Added a unit test to TestTezCompiler.java
> Added an e2e test to tez.conf
> 
> ant test-tez passes
> ant test-e2e-tez has three failures - I am investigating to see if they are 
> releated, or perhaps just transient
> 
> Question: There is already a separate suite of STREAM tests in 
> streaming.conf. Maybe I should remove my e2e test and we should add 
> streaming.conf as a dependency to the test-e2e-tez target? I haven't tried to 
> run streaming.conf yet.
> 
> 
> Thanks,
> 
> Alex Bain
> 
>

Reply via email to