[ 
https://issues.apache.org/jira/browse/HDFS-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042133#comment-14042133
 ] 

Konstantin Shvachko commented on HDFS-6469:
-------------------------------------------

Maysam, most of your questions are about performance, which I hope I already 
answered earlier. I can add to this that
* it is sort of a chicken and egg issue asking about performance numbers before 
a system is built. ZK coordination engine is being developed in HADOOP-10641, 
and saying how it will perform is premature.
* On the throughput. Let's say ZK can perform 30K write operations per second 
and NN does 10K metadata ops/sec, which is close to the reality based on 
published numbers. Each metadata operation requires a write to ZK, but NN would 
not generate more than 10K ops/sec and therefore ZK should not be a bottleneck.
* Coordination engines are pluggable, which is an important part of the design. 
As people mentioned in this and other threads Hadoop does not necessarily need 
to be in the business of building CE implementations.
This is similar e.g. to how we plug different authentication systems like 
Kerberos rather than reimplementing them. Similarly, I don't see a hero here 
with a secret way of making consensus efficient. One engine will win as the 
default, and there is always an option to use other.
* DataNode scalability is an interesting angle. This would've been the same as 
running multiple standbys or federated nodes. You may have more experience with 
that than me. Would be interesting to hear how DN scalability works for you.
The standard setup with 3 ConsensusNodes allows to tolerate a single node 
failure. With 5 CNodes you design to withstand simultaneous failure of two of 
them. From HA perspective running more than 7 CNodes does not add much value 
because simultaneous failures of 3 nodes are quite rare. Although, one could 
choose to run more CNodes for load balancing purposes.
I could not see any scalability issues for DNs on 3 and 5 CNode clusters. 
Conceptually RPCs from DNs to NNs are small compared to data transfers and 
should not cause network issues.
* Leader election is an integral part of a Paxos algorithm and should be 
considered towards its performance. This is different from the application 
level fail-over. In current HDFS (the application) we have both: leader 
election with ZK, QJM or BookKeeper, and HDFS fail-over, which includes NN, 
clients, and DNs fail-over to SBN.
With active-active architecture we eliminate HDFS fail-over, since all nodes 
are always active, there are no standbys, and no split-brain or fencing.

Hope this answers your questions.

> Coordinated replication of the namespace using ConsensusNode
> ------------------------------------------------------------
>
>                 Key: HDFS-6469
>                 URL: https://issues.apache.org/jira/browse/HDFS-6469
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: namenode
>    Affects Versions: 3.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>         Attachments: CNodeDesign.pdf
>
>
> This is a proposal to introduce ConsensusNode - an evolution of the NameNode, 
> which enables replication of the namespace on multiple nodes of an HDFS 
> cluster by means of a Coordination Engine.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to