[ 
https://issues.apache.org/jira/browse/HADOOP-10641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14057049#comment-14057049
 ] 

Konstantin Shvachko commented on HADOOP-10641:
----------------------------------------------

??several unanswered or un-retracted objections??

* I did address the "complexity" issue in [the first paragraph of my reply to 
Suresh|https://issues.apache.org/jira/browse/HDFS-6469?focusedCommentId=14021017&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14021017]
I cannot and probably should not address comfort levels of community members in 
general. But I can and will gladly address technical issues should you raise 
any.
These two jiras do introduce some concepts, which may be new to somebody (as 
they were to me when I started the project). But distributed coordination is 
the direction in which distributed systems are moving towards their maturity. 
I'll just mention Google's Spanner and Facebook's HydraBase here as examples. 
In my experience such concepts in fact simplify system architectures rather 
than complicate them.
* I will address Todd's comment in HDFS-6469 in more details.

??the design does not add much (or perhaps any) benefit over a simpler solution 
that builds on the current HA system in Hadoop??

* I discussed the alternative solution in [my reply to 
Todd|https://issues.apache.org/jira/browse/HDFS-6469?focusedCommentId=14027179&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14027179],
 see section on ActiveActive vs ActiveStanby HA.
This approach faces essentially the same problems as ConsensusNode, or "opens 
the same can of worms as the ConsensusNode" in Todd's words. But CNode in the 
end gives us all active NNs, rather than single active other RD-only standbys.
* Coordination opens an opportunity for geographically distributed HDFS, which 
allows to scale file system across data centers.
* Coordination opens an opportunity for active-active Yarn.
* Coordination opens an opportunity for replicated regions in HBase.

??I'm concerned about baking in dependence on a proprietary 3rd party system 
for HA capabilities??
Not sure which 3rd party system dependencies you see here. There are non 
mentioned in the CNode design. And ZK is already a dependency for Hadoop HA.

??general agreement??
I really don't know how to answer to the rest of your comments, Aaron.
* You seem to have issues with the design of HDFS-6469, but did not present any 
technical reasons there.
* You make HDFS-6469 a pre-condition for HADOOP-10641, but CNode implementation 
cannot start without the CE interface.
* Committing this to a development branch wouldn't make sense without you being 
convinced or comfortable to have it merged to trunk once the work is done.
* You do not give a clue on what would indicate a "general agreement" or what 
would convince you that there is one.

We are hosting a [community meeting next 
week|https://www.eventbrite.com/e/consensus-based-replication-in-hadoop-a-deep-dive-tickets-12158236613],
 which was announce on the dev lists. The topics in the agenda include 
technical discussion as well as the logistics of moving forward. Are you 
available to talk about this issues at the meeting and potentially work out a 
general agreement or a compromise?

> Introduce Coordination Engine
> -----------------------------
>
>                 Key: HADOOP-10641
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10641
>             Project: Hadoop Common
>          Issue Type: New Feature
>    Affects Versions: 3.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Plamen Jeliazkov
>         Attachments: HADOOP-10641.patch, HADOOP-10641.patch, 
> HADOOP-10641.patch, hadoop-coordination.patch
>
>
> Coordination Engine (CE) is a system, which allows to agree on a sequence of 
> events in a distributed system. In order to be reliable CE should be 
> distributed by itself.
> Coordination Engine can be based on different algorithms (paxos, raft, 2PC, 
> zab) and have different implementations, depending on use cases, reliability, 
> availability, and performance requirements.
> CE should have a common API, so that it could serve as a pluggable component 
> in different projects. The immediate beneficiaries are HDFS (HDFS-6469) and 
> HBase (HBASE-10909).
> First implementation is proposed to be based on ZooKeeper.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to