[
https://issues.apache.org/jira/browse/MAPREDUCE-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103499#comment-13103499
]
Ravi Gummadi commented on MAPREDUCE-2928:
-----------------------------------------
This patch removed the boolean variable "diskFailed" and introduced
"lastNumDirs".
(1) In the original patch of MR2413, in offerService(), I didn't check the
number of good local dirs count against previous/last good local dirs count for
determining the health of disks because there could be a case where a failed
disk becomes good and a good disk becomes bad between 2 consecutive
checks-in-offerService(). I mean if a nonwritable disk is made writable(may be
manually) at some time and in the same minute(i.e. before the next
disk-health-check-in-offerService) another good disk failed, then that won't
result in the re-init-of-TT with this changed code in this patch. This will
result in wrong list of good local dirs in TT's memory and will cause using bad
disks. Possible race condition ?
{noformat}
if (numDirs < lastNumDirs) {
return State.STALE;
}
{noformat}
(2) Also, this new variable lastNumDirs needs to be initialized before the code
of offerService() gets executed first time ---- to avoid re-init of TT the
first time control comes to disk-health-check-code of offerService().
Right ?
> MR-2413 improvements
> --------------------
>
> Key: MAPREDUCE-2928
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2928
> Project: Hadoop Map/Reduce
> Issue Type: Sub-task
> Components: tasktracker
> Reporter: Eli Collins
> Assignee: Eli Collins
> Fix For: 0.20.205.0
>
> Attachments: mapreduce-2928-1.patch
>
>
> Tracks improvements to MR-2413. See [this
> comment|https://issues.apache.org/jira/browse/MAPREDUCE-2413?focusedCommentId=13095073&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13095073].
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira