Sandor Magyari created AMBARI-17901:
---------------------------------------
Summary: Make HDFS operations resilient to namenode safemode
Key: AMBARI-17901
URL: https://issues.apache.org/jira/browse/AMBARI-17901
Project: Ambari
Issue Type: Bug
Components: ambari-server
Affects Versions: 2.4.0
Reporter: Sandor Magyari
Fix For: 2.5.0
HdfsResourceJar and HdfsResourceWebHDFS (WebHDFSUtil) are the classes that
carry out the HDFS operations. All retry able operations (e.g. SETPERMISSION)
should be guarded with retry logic that would retry the operation until a given
timeout before giving up and bailing out.
To determine which HDFS operations are retry able might be as easy as just
looking the returned status/error code or the type of the exception (e.g.
"RetriableException") though this needs to be verified if it's consistent with
both the webhdfs and hdfsresource jar.
This problem came up in https://issues.apache.org/jira/browse/AMBARI-17182 when
starting all services after Enabling HA.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)