[
https://issues.apache.org/jira/browse/AMBARI-17901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sandor Magyari updated AMBARI-17901:
------------------------------------
Fix Version/s: (was: 2.5.0)
3.0.0
> Make HDFS operations resilient to namenode safemode
> ---------------------------------------------------
>
> Key: AMBARI-17901
> URL: https://issues.apache.org/jira/browse/AMBARI-17901
> Project: Ambari
> Issue Type: Bug
> Components: ambari-server
> Affects Versions: 2.4.0
> Reporter: Sandor Magyari
> Assignee: Sandor Magyari
> Fix For: 3.0.0
>
>
> HdfsResourceJar and HdfsResourceWebHDFS (WebHDFSUtil) are the classes that
> carry out the HDFS operations. All retry able operations (e.g. SETPERMISSION)
> should be guarded with retry logic that would retry the operation until a
> given timeout before giving up and bailing out.
> To determine which HDFS operations are retry able might be as easy as just
> looking the returned status/error code or the type of the exception (e.g.
> "RetriableException") though this needs to be verified if it's consistent
> with both the webhdfs and hdfsresource jar.
> This problem came up in https://issues.apache.org/jira/browse/AMBARI-17182
> when starting all services after Enabling HA.
> Retry count and timeout should be clarified, as sometimes it may take a long
> time for namenode to exit safemode.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)