[
https://issues.apache.org/jira/browse/MAPREDUCE-2177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12929648#action_12929648
]
Ted Yu commented on MAPREDUCE-2177:
-----------------------------------
The occurrence in our cluster may have something to do with the fact that we
run HBase region server alongside task tracker.
Reporting progress from a thread that isn't blocked by long write to disk or
combiner call is one option. We can put some limit on the total amount of time
spillDone.awaitNanos() calls take in the following loop:
while (kvstart != kvend) {
reporter.progress();
spillDone.awaitNanos();
}
> The wait for spill completion should call Condition.awaitNanos(long
> nanosTimeout)
> ---------------------------------------------------------------------------------
>
> Key: MAPREDUCE-2177
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2177
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: tasktracker
> Affects Versions: 0.20.2
> Reporter: Ted Yu
>
> We sometimes saw maptask timeout in cdh3b2. Here is log from one of the
> maptasks:
> 2010-11-04 10:34:23,820 INFO org.apache.hadoop.mapred.MapTask: Spilling map
> output: buffer full= true
> 2010-11-04 10:34:23,820 INFO org.apache.hadoop.mapred.MapTask: bufstart =
> 119534169; bufend = 59763857; bufvoid = 298844160
> 2010-11-04 10:34:23,820 INFO org.apache.hadoop.mapred.MapTask: kvstart =
> 438913; kvend = 585320; length = 983040
> 2010-11-04 10:34:41,615 INFO org.apache.hadoop.mapred.MapTask: Finished spill
> 3
> 2010-11-04 10:35:45,352 INFO org.apache.hadoop.mapred.MapTask: Spilling map
> output: buffer full= true
> 2010-11-04 10:35:45,547 INFO org.apache.hadoop.mapred.MapTask: bufstart =
> 59763857; bufend = 298837899; bufvoid = 298844160
> 2010-11-04 10:35:45,547 INFO org.apache.hadoop.mapred.MapTask: kvstart =
> 585320; kvend = 731585; length = 983040
> 2010-11-04 10:45:41,289 INFO org.apache.hadoop.mapred.MapTask: Finished spill
> 4
> Note how long the last spill took.
> In MapTask.java, the following code waits for spill to finish:
> while (kvstart != kvend) { reporter.progress(); spillDone.await(); }
> In trunk code, code is similar.
> There is no timeout mechanism for Condition.await(). In case the SpillThread
> takes long before calling spillDone.signal(), we would see timeout.
> Condition.awaitNanos(long nanosTimeout) should be called.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.