[
https://issues.apache.org/jira/browse/MAPREDUCE-2177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12929427#action_12929427
]
Todd Lipcon commented on MAPREDUCE-2177:
----------------------------------------
Chris, I think Ted's point is not that it should return after a timeout, but
that it should call reporter.progress and then go back to waiting. This seems
valid - if the mapper thread is blocked because the buffer is full, either the
buffer spill thread should be calling progress() as it spills the buffer to
disk, or the blocked thread should periodically unblock to call progress(),
don't you think?
I think so long as the spiller is actually making some progress getting bytes
to disk, it shouldn't cause a task failure - this kind of "alive but very slow"
scenario is supposed to be handled by speculation rather than suicide :)
> The wait for spill completion should call Condition.awaitNanos(long
> nanosTimeout)
> ---------------------------------------------------------------------------------
>
> Key: MAPREDUCE-2177
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2177
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: tasktracker
> Affects Versions: 0.20.2
> Reporter: Ted Yu
>
> We sometimes saw maptask timeout in cdh3b2. Here is log from one of the
> maptasks:
> 2010-11-04 10:34:23,820 INFO org.apache.hadoop.mapred.MapTask: Spilling map
> output: buffer full= true
> 2010-11-04 10:34:23,820 INFO org.apache.hadoop.mapred.MapTask: bufstart =
> 119534169; bufend = 59763857; bufvoid = 298844160
> 2010-11-04 10:34:23,820 INFO org.apache.hadoop.mapred.MapTask: kvstart =
> 438913; kvend = 585320; length = 983040
> 2010-11-04 10:34:41,615 INFO org.apache.hadoop.mapred.MapTask: Finished spill
> 3
> 2010-11-04 10:35:45,352 INFO org.apache.hadoop.mapred.MapTask: Spilling map
> output: buffer full= true
> 2010-11-04 10:35:45,547 INFO org.apache.hadoop.mapred.MapTask: bufstart =
> 59763857; bufend = 298837899; bufvoid = 298844160
> 2010-11-04 10:35:45,547 INFO org.apache.hadoop.mapred.MapTask: kvstart =
> 585320; kvend = 731585; length = 983040
> 2010-11-04 10:45:41,289 INFO org.apache.hadoop.mapred.MapTask: Finished spill
> 4
> Note how long the last spill took.
> In MapTask.java, the following code waits for spill to finish:
> while (kvstart != kvend) { reporter.progress(); spillDone.await(); }
> In trunk code, code is similar.
> There is no timeout mechanism for Condition.await(). In case the SpillThread
> takes long before calling spillDone.signal(), we would see timeout.
> Condition.awaitNanos(long nanosTimeout) should be called.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.