[
https://issues.apache.org/jira/browse/MAPREDUCE-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113181#comment-13113181
]
Matt Foley commented on MAPREDUCE-3077:
---------------------------------------
Kihwal Lee responded:
bq. We need to consider the opposite case as well. The file system in a failing
disk can be remounted rw and may work for a short amount of time but then fail
again. I am sure you've seen this too since you said "sometimes". We need to
make sure the potential performance degradation caused by bad disk health check
is limited. The thread can hang for a long time (e.g. 300 seconds) until the
kernel turns it back to ro.
> re-enable faulty TaskTracker storage without restarting TT, when appropriate
> ----------------------------------------------------------------------------
>
> Key: MAPREDUCE-3077
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3077
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: tasktracker
> Affects Versions: 0.20.205.0
> Reporter: Matt Foley
>
> In MAPREDUCE-2928, Ravi Gummadi proposed:
> bq. we can add LocalStorage.checkBadLocalDirs() call to TT.initialize() that
> can do disk-health-check of bad local dirs and add dirs to the good local
> dirs list if they become good.
> and Eli Collins added:
> bq. Sounds good. Since transient disk failures may cause a file system to
> become read-only (causing permanent failures) sometimes re-mounting is
> sufficient to recover in which case it makes sense to re-enable faulty disks
> w/o TT restart.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira