[
https://issues.apache.org/jira/browse/HDFS-11740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15997022#comment-15997022
]
Anu Engineer commented on HDFS-11740:
-------------------------------------
Thanks for sharing your thoughts. What is still missing in my mind are:
# Why do we need this extra settings -- For example, HDFS has a fixed HB and it
has never introduced a variable HB for different states and HDFS has far more
states.
# The example you quoted, the decommissioning, it is a very time consuming
process since the datanode has to move all containers away from the node. See
my design doc in HDFS-11493, if you assume a high density node (say 192 TB
capacity), it might take hours to decommission a node. So a 30 second or a
lesser heartbeat is not going to be an issue.
# I do agree that unlike HDFS, the state machine approach gives us the
flexibility to achieve this. Since we are aware in which state we are in, and
each state has the ability to manage its own state. But I am worried that we
are adding a feature because we can. In other words, I am still looking for the
business problem that we want to solve with this feature.
# We have already discussed that saving 90 seconds during boot up or a 30
seconds while decommissioning is the best case so far. It is trivial to change
the heartbeat frequency to 15 seconds and then those windows will also become
smaller.
# So I am looking to understand a case where a variable heartbeat will be
needed, not a case where kinda good to have. With HDFS, I have always struggled
with is too many settings. So in Ozone, we actively try to make decisions that
avoid burdening the user.
# So first and foremost , let us define the set of cases where this feature
will be useful. Then we can talk about the code changes and other issues.
# Once we have clarity on this question I will come back and discuss the
_before_ and _after_ approaches.
> Ozone: Differentiate time interval for different DatanodeStateMachine state
> tasks
> ---------------------------------------------------------------------------------
>
> Key: HDFS-11740
> URL: https://issues.apache.org/jira/browse/HDFS-11740
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: ozone
> Reporter: Weiwei Yang
> Assignee: Weiwei Yang
> Attachments: HDFS-11740-HDFS-7240.001.patch,
> HDFS-11740-HDFS-7240.002.patch, statemachine_1.png, statemachine_2.png
>
>
> Currently datanode state machine transitioned between tasks in a fixed time
> interval, defined by {{ScmConfigKeys#OZONE_SCM_HEARTBEAT_INTERVAL_SECONDS}},
> the default value is 30s. Once datanode is started, it will need 90s before
> transited to {{Heartbeat}} state, such a long lag is not necessary. Propose
> to improve the logic of time interval handling, it seems only the heartbeat
> task needs to be scheduled in {{OZONE_SCM_HEARTBEAT_INTERVAL_SECONDS}}
> interval, rest should be done without any lagging.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]