[
https://issues.apache.org/jira/browse/AMBARI-17901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393883#comment-15393883
]
Sumit Mohanty commented on AMBARI-17901:
----------------------------------------
Along with retry - as it exists for EU, Amabri should have the ability to check
(perhaps a custom command) if NN has exited the safe mode so that any Wizard
can use to perform an informed wait. This long with live logs from NN through
LogSearch will make it visible to the end user as to what the progress.
> Make HDFS operations resilient to namenode safemode
> ---------------------------------------------------
>
> Key: AMBARI-17901
> URL: https://issues.apache.org/jira/browse/AMBARI-17901
> Project: Ambari
> Issue Type: Bug
> Components: ambari-server
> Affects Versions: 2.4.0
> Reporter: Sandor Magyari
> Assignee: Sandor Magyari
> Fix For: 2.5.0
>
>
> HdfsResourceJar and HdfsResourceWebHDFS (WebHDFSUtil) are the classes that
> carry out the HDFS operations. All retry able operations (e.g. SETPERMISSION)
> should be guarded with retry logic that would retry the operation until a
> given timeout before giving up and bailing out.
> To determine which HDFS operations are retry able might be as easy as just
> looking the returned status/error code or the type of the exception (e.g.
> "RetriableException") though this needs to be verified if it's consistent
> with both the webhdfs and hdfsresource jar.
> This problem came up in https://issues.apache.org/jira/browse/AMBARI-17182
> when starting all services after Enabling HA.
> Retry count and timeout should be clarified, as sometimes it may take a long
> time for namenode to exit safemode.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)