[
https://issues.apache.org/jira/browse/HDFS-11740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15995416#comment-15995416
]
Anu Engineer edited comment on HDFS-11740 at 5/3/17 6:55 PM:
-------------------------------------------------------------
@Weiwei yang, Thanks for the v2 patch. Sorry for the long comment, just want to
make sure that we are on the same page.
I am still not able to see the advantage of introducing this change.
So I am going to argue both sides to makes sure I understand the cost/benefits
of this solutions.
Fixed Heartbeat - Pros:
# Simple to understand and write code. We are able to write good error messages
like this.
{{Unable to communicate to SCM server at server.hortonworks.com:9861. We have
not been able to communicate to this SCM server for past 3300 seconds.}}
The above error message is from {{EndpointStateMachine.java#logIfneeded}}
# Fewer knobs to adjust -- Since init, version and register are three states --
we are optimizing the first 90 seconds of a datanodes life. Since datanodes are
very long running processes, does this optimization matter ?
# When you have a cluster with 3000+ datanodes, SCM might like that fact that
datanodes are slow in reaching out to it.
# Also the first 90 seconds -- will be the time that datanode takes to read and
get ready in most cases. So think of a datanode doing 2 things -- One is
reading data of the local HDDs -- other is talking to SCM about its presence.
This is a workflow that can proceed in parallel. In other words, they should
not intermingle, unless we reach a place where one has to wait for another.
The first such point is when SCM sends a command to datanode that is it is not
ready to handle yet. By giving 90 seconds to datanode before any such
rendezvous point, we are avoiding a possible wait condition.
Fixed Heartbeat - Cons:
# Datanodes are wasting first 90 seconds. In a small cluster -- we can bootup
much faster.
# When we add new states -- this might make the datanode waste more time.
I wanted to see your thoughts on the pros/cons argument on why we want to
remove fixed heartbeats and move to variable heartbeats.
More specific things:
# Why the change in executor ? -- So that we can create a pre-planned set of
futures ? Please see another of my comment below.
# I like that fact that each state specifies the wait time internally, but
RegisterEndpointTask seems to wait 0 seconds ?
# There is one big semantic difference -- The current code artifically creates
lags -- for example -- the main loop does not run with a fxied cadence.
Instead, it runs with a number of seconds from the last time the action was
performed ( This a pattern we use both in SCM and datanode).
_This is a critical, since the RPC layer will/could retry._
If that retry is happening, let us say one SCM is dead or network issue -- we
don't want the scheduler to be running the next task immediately. We want some
quite period since this is an admin task -- and we should not be consuming too
much resources. I am worried that RPC retry will happen till we time out and
then due to this
{code}
ScheduledFuture taskFuture = executor.schedule(
endpointTask,
endpointTask.getTaskDuration(),
TimeUnit.SECONDS);
{code}
an already queued task would fire immediately.
If you want to support this feature -- may I suggest that we make changes in
{{DatanodeStates}}
* Add a time to wait value here.
* In {{start()}} read the wait value and sleep for that much duration -- that
allows you to change each steps time duration.
was (Author: anu):
@Weiwei yang, Thanks for the v2 patch. Sorry for the long comment, just want to
make sure that we are on the same page.
I am still not able to see the advantage of introducing this change.
So I am going to argue both sides to makes sure I understand the cost/benefits
of this solutions.
Fixed Heartbeat - Pros:
# Simple to understand and write code. We are able to write good error messages
like this.
{{Unable to communicate to SCM server at server.hortonworks.com:9861. We have
not been able to communicate to this SCM server for past 3300 seconds.}}
The above error message is from {{EndpointStateMachine.java#logIfneeded}}
# Fewer knobs to adjust -- Since init, version and register are three states --
we are optimizing the first 90 seconds of a datanodes life. Since datanodes are
very long running processes, does this optimization matter ?
# When you have a cluster with 3000+ datanodes, SCM might like that fact that
datanodes are slow in reaching out to it.
# Also the first 90 seconds -- will be the time that datanode takes to read and
get ready in most cases. So think of a datanode doing 2 things -- One is
reading data of the local HDDs -- other is talking to SCM about its presence.
This is a workflow that can proceed in parallel. In other words, they should
not intermingle, unless we reach a place where one has to wait for another.
The first such point is when SCM sends a command to datanode that is it is not
ready to handle yet. By giving 90 seconds to datanode before any such
rendezvous point, we are avoiding a possible wait condition.
Fixed Heartbeat - Cons:
# A datanodes are wasting first 90 seconds. In a small cluster -- we can bootup
much faster.
# When we add new states -- this might make the datanode waste more time.
I wanted to see your thoughts on the pros/cons argument on why we want to
remove fixed heartbeats and move to variable heartbeats.
More specific things:
# Why the change in executor ? -- So that we can create a pre-planned set of
futures ? Please see another of my comment below.
# I like that fact that each state specifies the wait time internally, but
RegisterEndpointTask seems to wait 0 seconds ?
# There is one big semantic difference -- The current code artifically creates
lags -- for example -- the main loop does not run with a fxied cadence.
Instead, it runs with a number of seconds from the last time the action was
performed ( This a pattern we use both in SCM and datanode).
_This is a critical, since the RPC layer will/could retry._
If that retry is happening, let us say one SCM is dead or network issue -- we
don't want the scheduler to be running the next task immediately. We want some
quite period since this is an admin task -- and we should not be consuming too
much resources. I am worried that RPC retry will happen till we time out and
then due to this
{code}
ScheduledFuture taskFuture = executor.schedule(
endpointTask,
endpointTask.getTaskDuration(),
TimeUnit.SECONDS);
{code}
an already queued task would fire immediately.
If you want to support this feature -- may I suggest that we make changes in
{{DatanodeStates}}
* Add a time to wait value here.
* In {{start()}} read the wait value and sleep for that much duration -- that
allows you to change each steps time duration.
> Ozone: Differentiate time interval for different DatanodeStateMachine state
> tasks
> ---------------------------------------------------------------------------------
>
> Key: HDFS-11740
> URL: https://issues.apache.org/jira/browse/HDFS-11740
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: ozone
> Reporter: Weiwei Yang
> Assignee: Weiwei Yang
> Attachments: HDFS-11740-HDFS-7240.001.patch,
> HDFS-11740-HDFS-7240.002.patch, statemachine_1.png, statemachine_2.png
>
>
> Currently datanode state machine transitioned between tasks in a fixed time
> interval, defined by {{ScmConfigKeys#OZONE_SCM_HEARTBEAT_INTERVAL_SECONDS}},
> the default value is 30s. Once datanode is started, it will need 90s before
> transited to {{Heartbeat}} state, such a long lag is not necessary. Propose
> to improve the logic of time interval handling, it seems only the heartbeat
> task needs to be scheduled in {{OZONE_SCM_HEARTBEAT_INTERVAL_SECONDS}}
> interval, rest should be done without any lagging.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]