-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/547/#review383
-----------------------------------------------------------



trunk/src/org/apache/pig/backend/hadoop/streaming/HadoopExecutableManager.java
<https://reviews.apache.org/r/547/#comment725>

    Referencing PigMapReduce.sJobContext may cause a race condition in local 
Pig jobs, similar to what is described in PIG-1831. Should a similar fix be 
applied where the context in PigMapReduce is in thread local storage?


- Adam


On 2011-05-19 16:27:22, Adam Warrington wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/547/
> -----------------------------------------------------------
> 
> (Updated 2011-05-19 16:27:22)
> 
> 
> Review request for pig.
> 
> 
> Summary
> -------
> 
> This is a patch for PIG-1702, which describes an issue where the task output 
> logs for PIG streaming jobs contains null input-split information. The 
> ability to query the input-split information through the JobConf went away 
> with the new MR API. We must now gain a reference to the underlying 
> FiletSplit, and query this reference for that information.
> 
> 
> Diffs
> -----
> 
>   
> trunk/src/org/apache/pig/backend/hadoop/streaming/HadoopExecutableManager.java
>  1088692 
> 
> Diff: https://reviews.apache.org/r/547/diff
> 
> 
> Testing
> -------
> 
> To test this, I wrote a very simple python script to pass data through using 
> PIG. After checking the task logs of the completed task, the stderr logs now 
> contain valid input split information. Below are the scripts and test data 
> used.
> 
> ### PIG commands run ###
> DEFINE testpy `test.py` SHIP ('test.py');
> raw_records = LOAD '/test.txt2'; 
> T1 = STREAM raw_records THROUGH testpy;
> dump T1;
> 
> ### test.py ###
> #!/usr/bin/python
> import sys
> 
> cnt = 0
> for line in sys.stdin:
>     print line.strip() + " " + str(cnt)
>     cnt += 1
> 
> ### contents of /test.txt on hdfs ###
> one line
> two line
> three line
> four line
> 
> 
> Thanks,
> 
> Adam
> 
>

Reply via email to