[jira] [Commented] (MAPREDUCE-3121) NodeManager should handle disk-failures

Ravi Gummadi (Commented) (JIRA) Mon, 14 Nov 2011 19:51:19 -0800

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150221#comment-13150221
 ]


Ravi Gummadi commented on MAPREDUCE-3121:
-----------------------------------------

>> Should there be a check for whether there are any good dirs left in 
>> ResourceLocalizationService before starting of localizing the resources?

If there are no good local dirs available, then the previous lines of code
{code}nmPrivateCTokensPath = diskHandler.getLocalPathForWrite(){/code} will 
through IOExcpetion. So I think checking again is unnecessary --- unless there 
is a race condition(disk failure is identified just before the call to 
startLocalizer(), for which there is very very little chance). With the current 
patch itself, if there are no good local dirs, then startLocalizer() will 
anyway fail/throwException.

Anyway, I will add a diskHandler.areDisksHealthy() check in the next version of 
the patch.
                
> NodeManager should handle disk-failures
> ---------------------------------------
>
>                 Key: MAPREDUCE-3121
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3121
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, nodemanager
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Ravi Gummadi
>             Fix For: 0.23.1
>
>         Attachments: 3121.patch, 3121.v1.1.patch, 3121.v1.patch
>
>
> This is akin to MAPREDUCE-2413 but for YARN's NodeManager. We want to 
> minimize the impact of transient/permanent disk failures on containers. With 
> larger number of disks per node, the ability to continue to run containers on 
> other disks is crucial.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3121) NodeManager should handle disk-failures

Reply via email to