[jira] Updated: (HADOOP-4041) IsolationRunner does not work as documented

Philip Zeyliger (JIRA) Mon, 25 May 2009 15:52:09 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Philip Zeyliger updated HADOOP-4041:
------------------------------------

    Attachment: HADOOP-4041-v2.patch

Attaching a patch.

I updated Tom's previous patch a little bit to get IsolationRunner to work for 
map tasks.  TestIsolationRunner passes.  I'm still running the other tests.

I've also been testing this manually:
{noformat}
$ bin/hadoop jar build/hadoop-0.21.0-dev-examples.jar fail -D 
keep.failed.task.files=true -failMappers
[lots of noise]
$ bin/hadoop org.apache.hadoop.mapred.IsolationRunner 
/tmp/hadoop-philip-trunk/mapred/local/taskTracker/jobcache/job_200905251539_0001/attempt_200905251539_0001_m_000000_0/job.xml
09/05/25 15:41:26 INFO mapred.MapTask: io.sort.mb = 100
09/05/25 15:41:26 INFO mapred.MapTask: data buffer = 79691776/99614720
09/05/25 15:41:26 INFO mapred.MapTask: record buffer = 262144/327680
Exception in thread "main" java.lang.RuntimeException: Intentional map failure
        at org.apache.hadoop.examples.FailJob$FailMapper.map(FailJob.java:53)
        at org.apache.hadoop.examples.FailJob$FailMapper.map(FailJob.java:48)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:528)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:310)
        at 
org.apache.hadoop.mapred.IsolationRunner.run(IsolationRunner.java:190)
        at 
org.apache.hadoop.mapred.IsolationRunner.main(IsolationRunner.java:202)
{noformat}
(The failure when re-run is what I'd expect, since the map always fails.  This 
is much better than, say, a ClassNotFound exception of some sort, which would 
indicate IsolationRunner not working.)

I had to rejigger TaskRunner a bit to be able to share code for generation of 
the classpath.  I suspect that there's still some funny business not happening 
for users of the DistributedCache.  I haven't dug in deeply there.

I'd like to propose that we open a separate JIRA for IsolationRunner for reduce 
tasks.  Reducers have to contact mappers to get the intermediate data, and, 
frankly, that's quite messy.  I believe it requires interacting with the job 
tracker, and that seems like a lot of dependencies for a tool that in theory 
runs in isolation.  So I'd like to get this fixed for mappers first and then 
tackle reducers separately.

-- Philip

> IsolationRunner does not work as documented
> -------------------------------------------
>
>                 Key: HADOOP-4041
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4041
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: documentation, mapred
>    Affects Versions: 0.18.0
>            Reporter: Yuri Pradkin
>         Attachments: HADOOP-4041-v2.patch, hadoop-4041.patch
>
>
> IsolationRunner does not work as documented in the tutorial.
> The tutorial  says "To use the IsolationRunner, first set 
> keep.failed.tasks.files to true (also see keep.tasks.files.pattern)."
> Should be:
>   keep.failed.task.files (not tasks)
> After the above was set (quoted from my message on hadoop-core):
> > After the task
> > hung, I failed it via the web interface.  Then I went to the node that was
> > running this task
> >
> >   $ cd ...local/taskTracker/jobcache/job_200808071645_0001/work
> > (this path is already different from the tutorial's)
> >
> >   $ hadoop org.apache.hadoop.mapred.IsolationRunner ../job.xml
> > Exception in thread "main" java.lang.NullPointerException
> >         at
> > org.apache.hadoop.mapred.IsolationRunner.main(IsolationRunner.java:164)
> >
> > Looking at IsolationRunner code, I see this:
> >
> >     164     File workDirName = new File(lDirAlloc.getLocalPathToRead(
> >     165                                   TaskTracker.getJobCacheSubdir()
> >     166                                   + Path.SEPARATOR + 
> > taskId.getJobID() 
> >     167                                   + Path.SEPARATOR + taskId
> >     168                                   + Path.SEPARATOR + "work",
> >     169                                   conf). toString());
> >
> > I.e. it assumes there is supposed to be a taskID subdirectory under the job
> > dir, but:
> >  $ pwd
> >  ...mapred/local/taskTracker/jobcache/job_200808071645_0001
> >  $ ls
> >  jars  job.xml  work
> >
> > -- it's not there.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4041) IsolationRunner does not work as documented

Reply via email to