[
https://issues.apache.org/jira/browse/MAPREDUCE-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12747891#action_12747891
]
Vinod K V commented on MAPREDUCE-913:
-------------------------------------
The original cause for this is a job whose DistributedCache files are modified
on HDFS while the job is still running and tasks are still being assigned.
(NOTE: The line numbers DO NOT
correspond to the trunk, but the trace should give an idea.)
{code}
2009-08-25 19:53:48,831 FATAL org.apache.hadoop.filecache.DistributedCache:
File:
hdfs://<HDFS_HOST>:<port>/user/a/b/c/distributed_data/distributed_file#distributed_file
has changed on HDFS since job started
2009-08-25 19:53:48,832 WARN org.apache.hadoop.mapred.TaskRunner:
attempt_200908191538_10587_r_000000_1Child Error
java.io.IOException: File:
hdfs://<HDFS_HOST>:<port>/user/a/b/c/distributed_data/distributed_file#distributed_file
has changed on HDFS since job started
at
org.apache.hadoop.filecache.DistributedCache.ifExistsAndFresh(DistributedCache.java:485)
at
org.apache.hadoop.filecache.DistributedCache.localizeCache(DistributedCache.java:356)
at
org.apache.hadoop.filecache.DistributedCache.getLocalCache(DistributedCache.java:205)
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:173)
{code}
A little time after this, the TaskRunner thread for this task crashes with the
following in tasktracker's out file:
{code}
Exception in thread "Thread-89595" java.lang.NullPointerException
at org.apache.hadoop.fs.FileUtil.makeShellPath(FileUtil.java:412)
at org.apache.hadoop.fs.FileUtil.makeShellPath(FileUtil.java:396)
at
org.apache.hadoop.mapred.TaskTracker$TaskInProgress.taskFinished(TaskTracker.java:2166)
at
org.apache.hadoop.mapred.TaskTracker$TaskInProgress.reportTaskFinished(TaskTracker.java:2091)
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:496)
{code}
The following also appears in the TaskTracker's log file
{code}
2009-08-25 19:53:51,838 ERROR org.apache.hadoop.mapred.TaskLog:
getTaskLogFileDetail threw an exception java.io.FileNotFoundException:
/hadoop/logs//mapred/userlogs
attempt_200908191538_10587_r_000000_1/log.index (No such file or directory)
{code}
Once this happens with a job, this particular slot on this TaskTracker is no
longer usable as the slot could not be successfully released according to the
code paths. And all the further tasks that are assigned to this slot hang in an
UNINITIALIZED state.
> TaskRunner crashes with NPE resulting in held up slots, UNINITIALIZED tasks
> and hung TaskTracker
> ------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-913
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-913
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: tasktracker
> Reporter: Vinod K V
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.