-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/547/
-----------------------------------------------------------

(Updated 2011-05-19 16:27:22.583249)


Review request for pig.


Changes
-------

Sigh...I edited this a while back, but didn't publish what I wrote.


Summary
-------

This is a patch for PIG-1702, which describes an issue where the task output 
logs for PIG streaming jobs contains null input-split information. The ability 
to query the input-split information through the JobConf went away with the new 
MR API. We must now gain a reference to the underlying FiletSplit, and query 
this reference for that information.


Diffs
-----

  
trunk/src/org/apache/pig/backend/hadoop/streaming/HadoopExecutableManager.java 
1088692 

Diff: https://reviews.apache.org/r/547/diff


Testing (updated)
-------

To test this, I wrote a very simple python script to pass data through using 
PIG. After checking the task logs of the completed task, the stderr logs now 
contain valid input split information. Below are the scripts and test data used.

### PIG commands run ###
DEFINE testpy `test.py` SHIP ('test.py');
raw_records = LOAD '/test.txt2'; 
T1 = STREAM raw_records THROUGH testpy;
dump T1;

### test.py ###
#!/usr/bin/python
import sys

cnt = 0
for line in sys.stdin:
    print line.strip() + " " + str(cnt)
    cnt += 1

### contents of /test.txt on hdfs ###
one line
two line
three line
four line


Thanks,

Adam

Reply via email to