[
https://issues.apache.org/jira/browse/MAPREDUCE-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113126#comment-13113126
]
Kihwal Lee commented on MAPREDUCE-2928:
---------------------------------------
bq. Sounds good. Since transient disk failures may cause a file system to
become read-only (causing permanent failures) sometimes re-mounting is
sufficient to recover in which case it makes sense to re-enable faulty disks
w/o TT restart.
We need to consider the opposite case as well. The file system in a failing
disk can be remounted rw and may work for a short amount of time but then fail
again. I am sure you've seen this too since you said "sometimes". We need to
make sure the potential performance degradation caused by bad disk health check
is limited. The thread can hang for a long time (e.g. 300 seconds) until the
kernel turns it back to ro.
Hey, I will also be out of town for three weeks and come back on 10/17. GMTA. :)
> MR-2413 improvements
> --------------------
>
> Key: MAPREDUCE-2928
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2928
> Project: Hadoop Map/Reduce
> Issue Type: Sub-task
> Components: tasktracker
> Reporter: Eli Collins
> Assignee: Eli Collins
> Fix For: 0.20.205.0
>
> Attachments: mapreduce-2928-1.patch, mapreduce-2928-2.patch,
> mapreduce-2928-3.patch
>
>
> Tracks improvements to MR-2413. See [this
> comment|https://issues.apache.org/jira/browse/MAPREDUCE-2413?focusedCommentId=13095073&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13095073].
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira