[
https://issues.apache.org/jira/browse/MAPREDUCE-6252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Devaraj K updated MAPREDUCE-6252:
---------------------------------
Target Version/s: 2.8.0
Hadoop Flags: Reviewed
+1, patch looks good to me.
> JobHistoryServer should not fail when encountering a missing directory
> ----------------------------------------------------------------------
>
> Key: MAPREDUCE-6252
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6252
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: jobhistoryserver
> Affects Versions: 2.6.0
> Reporter: Craig Welch
> Assignee: Craig Welch
> Attachments: MAPREDUCE-6252.0.patch, MAPREDUCE-6252.1.patch
>
>
> The JobHistoryServer maintains a cache of job serial number parts to dfs
> paths which it uses when seeking a job it no longer has in its memory cache,
> multiple directories for a given serial number differentiated by time stamp.
> At present the jobhistory server will fail any time it attempts to find a job
> in a directory which no longer exists based on that cache - even though the
> job may well exist in a different directory for the serial number. Typically
> this is not an issue, but the history cleanup process removes the directory
> from dfs before removing it from the cache which leaves a window of time
> where a directory may be missing from dfs which is present in the cache,
> resulting in failure. For some dfs's it appears that the top level directory
> may become unavailable some time before the full deletion of the tree
> completes which extends what might otherwise be a brief period of failure to
> a more extended period. Further, this also places the service at the mercy
> of outside processes which might remove those directories. The proposal is
> simply to make the server resistant to this state such that encountering this
> missing directory is not fatal and the process will continue on to seek it
> elsewhere.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)