[
https://issues.apache.org/jira/browse/HDFS-11493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15980130#comment-15980130
]
Anu Engineer commented on HDFS-11493:
-------------------------------------
[~cheersyang] Thanks for taking time out to look at the doc and ask all
pertinent questions. Please see replies inline. Looking forward to more
comments and review comments.
bq. Right now node state only has HEALTHY, STALE, DEAD, UNKNOWN. Is it useful
to add following states as well?
That is the plan. As we add ability in SCM to handle maintenance mode and
Decommissioning support, we will add those states into node state.
bq. nodes container report will arrive scm in a wave? Would that cause network
problem?
Very good question. This is something that I have struggled a bit about . Here
is how I have modeled it. Assume that we have a typical machine with 12 disks
with 5 TB each. That would give us a node capacity of *60 TB*. Assuming that
our average size of container is 5 GB -- we have *12,000* closed containers in
a node. Now assume that each container takes around 128 bytes to report its
state, then we have little less than 2 MB per full container report from a node.
Today, a block report is many times larger, so I am hoping that we can keep 10
pools running at a time, that is 240 nodes can send in its reports. It is not
that they are all going to send the report at the same time, but in a 5 minute
window we need to process 480 MB of network traffic. That is around 1.6 MBps, I
am assuming that SCM machine will be able to process this In terms of CPU,
network and memory bandwidth. We will also support the maximum concurrency in
pool processing, so SCM can be run on machine that is less powerful if needed.
bq. Can we move configuration properties
Sure, will do when I update the patch.
bq. More specifically, if a thread wants to add a command for datanode A, and
another thread wants to add a command for datanode B, we probably don't want
them to wait for the othe
It is a very good suggestion. Can I leave a TODO now, I would like to add that
kind of locking once we get a chance to profile this code path and if we find
that threads are waiting on this lock, we certainly should do it. It is easy to
do also.
bq.This is a in-memory queue, how to make sure not to run into inconsistent
state? Imagine if replication manager has just processed container reports from
a pool and ask a datanode to replicate a container, assume the replication is
happening in progress. And then scm crashed and restarted, how scm gets to know
what current state is ?
You are right, SCM crashing is a problem. Current thinking is to create an HA
solution using Apache Ratis. Once we have a single instance SCM we will get SCM
to support Ratis. I am told that Ratis is planning to support a distributed map
class, which will look like a normal map for all purposes, but internally will
replicate using Raft protocol.
bq. The queue seems to be time ordered, I think it will be better to support
priority as well. Commands may have different priority, for example, replicate
a container priority is usually higher than delete a container replica;
replicate a container also may have different priorities according to the
number of replicas in desire.
Good catch, I had not thought about it. Right now, the architecture is to send
all commands in the next heartbeat to the node. Since the SCM typically sends
all queued commands ( assumption here is that we will not have thousands of
commands for a node) to a node when it sees a HB.
We might want to create a priority of processing these commands -- specified
by SCM so datanodes can get to important tasks first. I would like to leave a
TODO and JIRA for this for time being.
> Ozone: SCM: Add the ability to handle container reports
> ---------------------------------------------------------
>
> Key: HDFS-11493
> URL: https://issues.apache.org/jira/browse/HDFS-11493
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: ozone
> Affects Versions: HDFS-7240
> Reporter: Anu Engineer
> Assignee: Anu Engineer
> Attachments: container-replication-storage.pdf,
> HDFS-11493-HDFS-7240.001.patch
>
>
> Once a datanode sends the container report it is SCM's responsibility to
> determine if the replication levels are acceptable. If it is not, SCM should
> initiate a replication request to another datanode. This JIRA tracks how SCM
> handles a container report.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]