Is there a way to identify the input file a mapper was running on when it failed? When a large job fails because of bad input lines I have to resort to rerunning the entire job to isolate a single bad line (since the log doesn't contain information on the file that that mapper was running on).
Basically, I would like to be able to do one of the following: 1. Find the file that a mapper was running on when it failed 2. Find the block that a mapper was running on when it failed (and be able to find file names from block ids) I haven't been able to find any documentation on facilities to accomplish either (1) or (2), so I'm hoping someone on this list will have a suggestion. I am using the Hadoop streaming API on hadoop 0.18.2. -Jason