[ https://issues.apache.org/jira/browse/HADOOP-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Philip Zeyliger updated HADOOP-4041: ------------------------------------ Attachment: HADOOP-4041-v2.patch Attaching a patch. I updated Tom's previous patch a little bit to get IsolationRunner to work for map tasks. TestIsolationRunner passes. I'm still running the other tests. I've also been testing this manually: {noformat} $ bin/hadoop jar build/hadoop-0.21.0-dev-examples.jar fail -D keep.failed.task.files=true -failMappers [lots of noise] $ bin/hadoop org.apache.hadoop.mapred.IsolationRunner /tmp/hadoop-philip-trunk/mapred/local/taskTracker/jobcache/job_200905251539_0001/attempt_200905251539_0001_m_000000_0/job.xml 09/05/25 15:41:26 INFO mapred.MapTask: io.sort.mb = 100 09/05/25 15:41:26 INFO mapred.MapTask: data buffer = 79691776/99614720 09/05/25 15:41:26 INFO mapred.MapTask: record buffer = 262144/327680 Exception in thread "main" java.lang.RuntimeException: Intentional map failure at org.apache.hadoop.examples.FailJob$FailMapper.map(FailJob.java:53) at org.apache.hadoop.examples.FailJob$FailMapper.map(FailJob.java:48) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:528) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:310) at org.apache.hadoop.mapred.IsolationRunner.run(IsolationRunner.java:190) at org.apache.hadoop.mapred.IsolationRunner.main(IsolationRunner.java:202) {noformat} (The failure when re-run is what I'd expect, since the map always fails. This is much better than, say, a ClassNotFound exception of some sort, which would indicate IsolationRunner not working.) I had to rejigger TaskRunner a bit to be able to share code for generation of the classpath. I suspect that there's still some funny business not happening for users of the DistributedCache. I haven't dug in deeply there. I'd like to propose that we open a separate JIRA for IsolationRunner for reduce tasks. Reducers have to contact mappers to get the intermediate data, and, frankly, that's quite messy. I believe it requires interacting with the job tracker, and that seems like a lot of dependencies for a tool that in theory runs in isolation. So I'd like to get this fixed for mappers first and then tackle reducers separately. -- Philip > IsolationRunner does not work as documented > ------------------------------------------- > > Key: HADOOP-4041 > URL: https://issues.apache.org/jira/browse/HADOOP-4041 > Project: Hadoop Core > Issue Type: Bug > Components: documentation, mapred > Affects Versions: 0.18.0 > Reporter: Yuri Pradkin > Attachments: HADOOP-4041-v2.patch, hadoop-4041.patch > > > IsolationRunner does not work as documented in the tutorial. > The tutorial says "To use the IsolationRunner, first set > keep.failed.tasks.files to true (also see keep.tasks.files.pattern)." > Should be: > keep.failed.task.files (not tasks) > After the above was set (quoted from my message on hadoop-core): > > After the task > > hung, I failed it via the web interface. Then I went to the node that was > > running this task > > > > $ cd ...local/taskTracker/jobcache/job_200808071645_0001/work > > (this path is already different from the tutorial's) > > > > $ hadoop org.apache.hadoop.mapred.IsolationRunner ../job.xml > > Exception in thread "main" java.lang.NullPointerException > > at > > org.apache.hadoop.mapred.IsolationRunner.main(IsolationRunner.java:164) > > > > Looking at IsolationRunner code, I see this: > > > > 164 File workDirName = new File(lDirAlloc.getLocalPathToRead( > > 165 TaskTracker.getJobCacheSubdir() > > 166 + Path.SEPARATOR + > > taskId.getJobID() > > 167 + Path.SEPARATOR + taskId > > 168 + Path.SEPARATOR + "work", > > 169 conf). toString()); > > > > I.e. it assumes there is supposed to be a taskID subdirectory under the job > > dir, but: > > $ pwd > > ...mapred/local/taskTracker/jobcache/job_200808071645_0001 > > $ ls > > jars job.xml work > > > > -- it's not there. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.