[ 
https://issues.apache.org/jira/browse/HDFS-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031686#comment-14031686
 ] 

Maysam Yabandeh commented on HDFS-6469:
---------------------------------------

Very interesting document indeed! I think as a community we should always keep 
this option on the table and revisit it once in a while to reevaluate the pros 
and cons.

Before giving detailed comments, I first want to make sure that I correctly 
understand the big picture. Does the jira suggest the following path:
# lets make changes to have hdfs ready for pluggable consensus
# people start with a bad implementation of consensus with poor performance
# then probably a hero comes along who has a secret way of making consensus 
efficient

If the above picture is correct, then the concern would be whether everybody in 
the community benefits from the cost of doing step 1 and perhaps being stuck 
with step 2. I would be much more comfortable when I see numbers in terms of 
latency, throughput, and last but not least code complexity. That should make 
it easier to convince the community about the suggested path.

About the performance, here are some concerns:
# How much would be the increased delay for write operations, both avg and 
stddev. 
# Does consensus negatively impacts the *write* throughput of NN? Paxos 
requires *many* messages per proposal to be exchanged between participants, 
which consumes the CPU and network of the NN.
# Does the load on DN scale with number of CNs? If I understand correctly each 
DN has to send changes to all CNs. How much the overhead on DN would be when we 
have seven CNs?
# Due to performance issues, in practice we see Multi-Paxos implemented instead 
of Paxos, in which a proposer assumes the role of the leader for a time period 
specified by a lease. In this case, the failure of the leader still makes NN 
unavailable until the new leader is elected. I wonder whether this would give 
any advantage over the current failover delay between primary and standby. This 
concern would be of course invalid if you offer an efficient solution for 
consensus that does not rely on leases.

> Coordinated replication of the namespace using ConsensusNode
> ------------------------------------------------------------
>
>                 Key: HDFS-6469
>                 URL: https://issues.apache.org/jira/browse/HDFS-6469
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: namenode
>    Affects Versions: 3.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>         Attachments: CNodeDesign.pdf
>
>
> This is a proposal to introduce ConsensusNode - an evolution of the NameNode, 
> which enables replication of the namespace on multiple nodes of an HDFS 
> cluster by means of a Coordination Engine.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to