[jira] [Commented] (AMBARI-17901) Make HDFS operations resilient to namenode safemode

Andrew Onischuk (JIRA) Wed, 20 Dec 2017 23:43:04 -0800

    [ 
https://issues.apache.org/jira/browse/AMBARI-17901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16299665#comment-16299665
 ]


Andrew Onischuk commented on AMBARI-17901:
------------------------------------------

Hello [~smagyari]. When starting namenode we now check for safemode. As of fix 
in AMBARI-17182. All the other operations are done only after successful start 
of NAMENODE (due to role_command_order.json). Retrying would add some 
unnecessary delay to operations which mean to fail due to something not 
working. 

> Make HDFS operations resilient to namenode safemode
> ---------------------------------------------------
>
>                 Key: AMBARI-17901
>                 URL: https://issues.apache.org/jira/browse/AMBARI-17901
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>    Affects Versions: 2.4.0
>            Reporter: Magyari Sandor Szilard
>            Assignee: Magyari Sandor Szilard
>             Fix For: 3.0.0
>
>
> HdfsResourceJar and HdfsResourceWebHDFS (WebHDFSUtil) are the classes that 
> carry out the HDFS operations. All retry able operations (e.g. SETPERMISSION) 
> should be guarded with retry logic that would retry the operation until a 
> given timeout before giving up and bailing out.
> To determine which HDFS operations are retry able might be as easy as just 
> looking the returned status/error code or the type of the exception (e.g. 
> "RetriableException") though this needs to be verified if it's consistent 
> with both the webhdfs and hdfsresource jar.
> This problem came up in https://issues.apache.org/jira/browse/AMBARI-17182 
> when starting all services after Enabling HA. 
> Retry count and timeout should be clarified, as sometimes it may take a long 
> time for namenode to exit safemode.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (AMBARI-17901) Make HDFS operations resilient to namenode safemode

Reply via email to