[ 
https://issues.apache.org/jira/browse/MAPREDUCE-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12747891#action_12747891
 ] 

Vinod K V commented on MAPREDUCE-913:
-------------------------------------


The original cause for this is a job whose DistributedCache files are modified 
on HDFS while the job is still running and tasks are still being assigned. 
(NOTE: The line numbers DO NOT
correspond to the trunk, but the trace should give an idea.)

{code}
2009-08-25 19:53:48,831 FATAL org.apache.hadoop.filecache.DistributedCache: 
File: 
hdfs://<HDFS_HOST>:<port>/user/a/b/c/distributed_data/distributed_file#distributed_file
has changed on HDFS since job started
2009-08-25 19:53:48,832 WARN org.apache.hadoop.mapred.TaskRunner: 
attempt_200908191538_10587_r_000000_1Child Error
java.io.IOException: File: 
hdfs://<HDFS_HOST>:<port>/user/a/b/c/distributed_data/distributed_file#distributed_file
 has changed on HDFS since job started
        at 
org.apache.hadoop.filecache.DistributedCache.ifExistsAndFresh(DistributedCache.java:485)
        at 
org.apache.hadoop.filecache.DistributedCache.localizeCache(DistributedCache.java:356)
        at 
org.apache.hadoop.filecache.DistributedCache.getLocalCache(DistributedCache.java:205)
        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:173)
{code}

A little time after this, the TaskRunner thread for this task crashes with the 
following in tasktracker's out file:
{code}
Exception in thread "Thread-89595" java.lang.NullPointerException
        at org.apache.hadoop.fs.FileUtil.makeShellPath(FileUtil.java:412)
        at org.apache.hadoop.fs.FileUtil.makeShellPath(FileUtil.java:396)
        at 
org.apache.hadoop.mapred.TaskTracker$TaskInProgress.taskFinished(TaskTracker.java:2166)
        at 
org.apache.hadoop.mapred.TaskTracker$TaskInProgress.reportTaskFinished(TaskTracker.java:2091)
        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:496)
{code}

The following also appears in the TaskTracker's log file
{code}
2009-08-25 19:53:51,838 ERROR org.apache.hadoop.mapred.TaskLog: 
getTaskLogFileDetail threw an exception java.io.FileNotFoundException: 
/hadoop/logs//mapred/userlogs
attempt_200908191538_10587_r_000000_1/log.index (No such file or directory)
{code}

Once this happens with a job, this particular slot on this TaskTracker is no 
longer usable as the slot could not be successfully released according to the 
code paths. And all the further tasks that are assigned to this slot hang in an 
UNINITIALIZED state.

> TaskRunner crashes with NPE resulting in held up slots, UNINITIALIZED tasks 
> and hung TaskTracker
> ------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-913
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-913
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>            Reporter: Vinod K V
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to