[jira] [Commented] (HDFS-13442) Ozone: Handle Datanode Registration failure

Anu Engineer (JIRA) Fri, 13 Apr 2018 11:31:56 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437711#comment-16437711
 ]


Anu Engineer commented on HDFS-13442:
-------------------------------------

Hi [~hanishakoneru],

Thanks for the patch. However, I feel that a data node should give up 
registration attempt after a really long time or under a condition of error. 
Retrying 10 times seem too low. For example, if the data nodes boot up earlier 
than SCM we would not want the data nodes to do silent after 10 tries 
(somewhere around 5 minutes) , If we are going to do a default value for max 
retries, we should try to target something in the order of days, say 24 hours 
or so.

In fact, we can read the HB frequency config value and multiply that to get 
24/12 hours.


also in the case, we get the error, _errorNodeNotPermitted_, should we shut 
down the data node and create some kind of error record on SCM so we can get 
that info back from SCM? I am also ok with the current approach where we will 
let the system slowly go time out.

  

> Ozone: Handle Datanode Registration failure
> -------------------------------------------
>
>                 Key: HDFS-13442
>                 URL: https://issues.apache.org/jira/browse/HDFS-13442
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ozone
>    Affects Versions: HDFS-7240
>            Reporter: Hanisha Koneru
>            Assignee: Hanisha Koneru
>            Priority: Major
>         Attachments: HDFS-13442-HDFS-7240.001.patch
>
>
> If a datanode is not able to register itself, we need to handle that 
> correctly. 
> If the number of unsuccessful attempts to register with the SCM exceeds a 
> configurable max number, the datanode should not make any more attempts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-13442) Ozone: Handle Datanode Registration failure

Reply via email to