[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103499#comment-13103499
 ] 

Ravi Gummadi commented on MAPREDUCE-2928:
-----------------------------------------

This patch removed the boolean variable "diskFailed" and introduced 
"lastNumDirs".

(1) In the original patch of MR2413, in offerService(), I didn't check the 
number of good local dirs count against previous/last good local dirs count for 
determining the health of disks because there could be a case where a failed 
disk becomes good and a good disk becomes bad between 2 consecutive 
checks-in-offerService(). I mean if a nonwritable disk is made writable(may be 
manually) at some  time and in the same minute(i.e. before the next 
disk-health-check-in-offerService) another good disk failed, then that won't 
result in the re-init-of-TT with this changed code in this patch. This will 
result in wrong list of good local dirs in TT's memory and will cause using bad 
disks. Possible race condition ?

{noformat}
if (numDirs < lastNumDirs) {
  return State.STALE;
}
{noformat}

(2) Also, this new variable lastNumDirs needs to be initialized before the code 
of offerService() gets executed first time ---- to avoid re-init of TT the 
first time control comes to disk-health-check-code of offerService().
Right ?

> MR-2413 improvements
> --------------------
>
>                 Key: MAPREDUCE-2928
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2928
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: tasktracker
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>             Fix For: 0.20.205.0
>
>         Attachments: mapreduce-2928-1.patch
>
>
> Tracks improvements to MR-2413. See [this 
> comment|https://issues.apache.org/jira/browse/MAPREDUCE-2413?focusedCommentId=13095073&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13095073].

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to