----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/547/ -----------------------------------------------------------
(Updated 2011-05-19 16:27:22.583249) Review request for pig. Changes ------- Sigh...I edited this a while back, but didn't publish what I wrote. Summary ------- This is a patch for PIG-1702, which describes an issue where the task output logs for PIG streaming jobs contains null input-split information. The ability to query the input-split information through the JobConf went away with the new MR API. We must now gain a reference to the underlying FiletSplit, and query this reference for that information. Diffs ----- trunk/src/org/apache/pig/backend/hadoop/streaming/HadoopExecutableManager.java 1088692 Diff: https://reviews.apache.org/r/547/diff Testing (updated) ------- To test this, I wrote a very simple python script to pass data through using PIG. After checking the task logs of the completed task, the stderr logs now contain valid input split information. Below are the scripts and test data used. ### PIG commands run ### DEFINE testpy `test.py` SHIP ('test.py'); raw_records = LOAD '/test.txt2'; T1 = STREAM raw_records THROUGH testpy; dump T1; ### test.py ### #!/usr/bin/python import sys cnt = 0 for line in sys.stdin: print line.strip() + " " + str(cnt) cnt += 1 ### contents of /test.txt on hdfs ### one line two line three line four line Thanks, Adam
