[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113181#comment-13113181
 ] 

Matt Foley commented on MAPREDUCE-3077:
---------------------------------------

Kihwal Lee responded:
bq. We need to consider the opposite case as well. The file system in a failing 
disk can be remounted rw and may work for a short amount of time but then fail 
again. I am sure you've seen this too since you said "sometimes". We need to 
make sure the potential performance degradation caused by bad disk health check 
is limited. The thread can hang for a long time (e.g. 300 seconds) until the 
kernel turns it back to ro.

> re-enable faulty TaskTracker storage without restarting TT, when appropriate
> ----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3077
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3077
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.20.205.0
>            Reporter: Matt Foley
>
> In MAPREDUCE-2928, Ravi Gummadi proposed:
> bq. we can add LocalStorage.checkBadLocalDirs() call to TT.initialize() that 
> can do disk-health-check of bad local dirs and add dirs to the good local 
> dirs list if they become good.
> and Eli Collins added:
> bq. Sounds good. Since transient disk failures may cause a file system to 
> become read-only (causing permanent failures) sometimes re-mounting is 
> sufficient to recover in which case it makes sense to re-enable faulty disks 
> w/o TT restart.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to