Two quotes for this problem: "Streaming map tasks should have a "map_input_file" environment variable like the following: map_input_file=hdfs://HOST/path/to/file"
"the value for map.input.file gives you the exact information you need." (didn't try) Rasit 2009/3/26 Jason Fennell <jdfenn...@gmail.com>: > Is there a way to identify the input file a mapper was running on when > it failed? When a large job fails because of bad input lines I have > to resort to rerunning the entire job to isolate a single bad line > (since the log doesn't contain information on the file that that > mapper was running on). > > Basically, I would like to be able to do one of the following: > 1. Find the file that a mapper was running on when it failed > 2. Find the block that a mapper was running on when it failed (and be > able to find file names from block ids) > > I haven't been able to find any documentation on facilities to > accomplish either (1) or (2), so I'm hoping someone on this list will > have a suggestion. > > I am using the Hadoop streaming API on hadoop 0.18.2. > > -Jason > -- M. Raşit ÖZDAŞ