[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150681#comment-13150681
 ] 

Hitesh Shah commented on MAPREDUCE-3121:
----------------------------------------

bq. Anyway, I will add a diskHandler.areDisksHealthy() check in the next 
version of the patch.
  My question was more of whether the task should start even if there is one 
healthy disk available. i.e. below the specified % failure threshold but there 
are still some healthy disks available. There are 2 parts to the health of a 
node. The state of disks when the node is already scheduled to setup/launch a 
container and the health status reported to the RM by the NM. The latter simply 
checks for the threshold. Now, in case of launching the container after the 
resources are localized ( in ContainersLaunch), the patch checks for the 
threshold whereas in the resource localization service, as mentioned above, it 
seems to only need one healthy disk to be available. I am fine with either 
approach as long as it is a common approach. We could follow the same approach 
as the one taken in 0.20.xxx if no one has any preferences on either.

                
> NodeManager should handle disk-failures
> ---------------------------------------
>
>                 Key: MAPREDUCE-3121
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3121
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, nodemanager
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Ravi Gummadi
>             Fix For: 0.23.1
>
>         Attachments: 3121.patch, 3121.v1.1.patch, 3121.v1.patch
>
>
> This is akin to MAPREDUCE-2413 but for YARN's NodeManager. We want to 
> minimize the impact of transient/permanent disk failures on containers. With 
> larger number of disks per node, the ability to continue to run containers on 
> other disks is crucial.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to