[ https://issues.apache.org/jira/browse/HADOOP-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Devaraj Das updated HADOOP-1252: -------------------------------- Attachment: 1252.new.patch Attached is the patch with Doug's comment incorporated. > Disk problems should be handled better by the MR framework > ---------------------------------------------------------- > > Key: HADOOP-1252 > URL: https://issues.apache.org/jira/browse/HADOOP-1252 > Project: Hadoop > Issue Type: Bug > Components: mapred > Affects Versions: 0.12.3 > Reporter: Devaraj Das > Assigned To: Devaraj Das > Fix For: 0.13.0 > > Attachments: 1252.new.patch, 1252.patch, 1252.patch > > > The MR framework should recover from Disk Failure problems without causing > jobs to hang. Note that this issue is about a short-term solution to solving > the problem. For example, by looking at the code and improving the exception > handling (to better detect faulty disks and missing files). The long term > approach might be to have a FS layer that takes care of failed disks and > makes it transparent to the tasks. That will be a separate issue by itself. > Some of the issues that have been reported are HADOOP-1087 and a comment by > Koji on HADOOP-1200 (not sure whether those are all). Please add to this > issue as much details as possible on disk failures leading to hung jobs > (details like relevant exception traces, way to reproduce, etc.). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.