[ 
https://issues.apache.org/jira/browse/HDFS-11740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15997022#comment-15997022
 ] 

Anu Engineer commented on HDFS-11740:
-------------------------------------


Thanks for sharing your thoughts. What is still missing in my mind are:
# Why do we need this extra settings -- For example, HDFS has a fixed HB and it 
has never introduced a variable HB for different states and HDFS has far more 
states.
# The example you quoted, the decommissioning, it is a very time consuming 
process since the datanode has to move all containers away from the node. See 
my design doc in HDFS-11493, if you assume a high density node (say 192 TB 
capacity), it might take hours to decommission a node. So a 30 second or a 
lesser heartbeat is not  going to be an issue.
# I do agree that unlike HDFS, the state machine approach gives us the 
flexibility to achieve this. Since we are aware in which state we are in, and 
each state has the ability to manage its own state. But I am worried that we 
are adding a feature because we can. In other words, I am still looking for the 
business problem that we want to solve with this feature.
# We have already discussed that saving 90 seconds during boot up or a 30 
seconds while decommissioning is the best case so far. It is trivial to change 
the heartbeat frequency to 15 seconds and then those windows will also become 
smaller.
# So I am looking to understand a case where a variable heartbeat will be 
needed, not a case where kinda good to have. With HDFS, I have always struggled 
with is too many settings. So in Ozone,  we actively try to make decisions that 
avoid burdening the user.
# So first and foremost , let us define the set of cases where this feature 
will be useful. Then we can talk about the code changes and other issues.
# Once we have clarity on this question I will come back and discuss the 
_before_ and _after_ approaches.

> Ozone: Differentiate time interval for different DatanodeStateMachine state 
> tasks
> ---------------------------------------------------------------------------------
>
>                 Key: HDFS-11740
>                 URL: https://issues.apache.org/jira/browse/HDFS-11740
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ozone
>            Reporter: Weiwei Yang
>            Assignee: Weiwei Yang
>         Attachments: HDFS-11740-HDFS-7240.001.patch, 
> HDFS-11740-HDFS-7240.002.patch, statemachine_1.png, statemachine_2.png
>
>
> Currently datanode state machine transitioned between tasks in a fixed time 
> interval, defined by {{ScmConfigKeys#OZONE_SCM_HEARTBEAT_INTERVAL_SECONDS}}, 
> the default value is 30s. Once datanode is started, it will need 90s before 
> transited to {{Heartbeat}} state, such a long lag is not necessary. Propose 
> to improve the logic of time interval handling, it seems only the heartbeat 
> task needs to be scheduled in {{OZONE_SCM_HEARTBEAT_INTERVAL_SECONDS}} 
> interval, rest should be done without any lagging.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to