[
https://issues.apache.org/jira/browse/HADOOP-10641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14057049#comment-14057049
]
Konstantin Shvachko commented on HADOOP-10641:
----------------------------------------------
??several unanswered or un-retracted objections??
* I did address the "complexity" issue in [the first paragraph of my reply to
Suresh|https://issues.apache.org/jira/browse/HDFS-6469?focusedCommentId=14021017&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14021017]
I cannot and probably should not address comfort levels of community members in
general. But I can and will gladly address technical issues should you raise
any.
These two jiras do introduce some concepts, which may be new to somebody (as
they were to me when I started the project). But distributed coordination is
the direction in which distributed systems are moving towards their maturity.
I'll just mention Google's Spanner and Facebook's HydraBase here as examples.
In my experience such concepts in fact simplify system architectures rather
than complicate them.
* I will address Todd's comment in HDFS-6469 in more details.
??the design does not add much (or perhaps any) benefit over a simpler solution
that builds on the current HA system in Hadoop??
* I discussed the alternative solution in [my reply to
Todd|https://issues.apache.org/jira/browse/HDFS-6469?focusedCommentId=14027179&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14027179],
see section on ActiveActive vs ActiveStanby HA.
This approach faces essentially the same problems as ConsensusNode, or "opens
the same can of worms as the ConsensusNode" in Todd's words. But CNode in the
end gives us all active NNs, rather than single active other RD-only standbys.
* Coordination opens an opportunity for geographically distributed HDFS, which
allows to scale file system across data centers.
* Coordination opens an opportunity for active-active Yarn.
* Coordination opens an opportunity for replicated regions in HBase.
??I'm concerned about baking in dependence on a proprietary 3rd party system
for HA capabilities??
Not sure which 3rd party system dependencies you see here. There are non
mentioned in the CNode design. And ZK is already a dependency for Hadoop HA.
??general agreement??
I really don't know how to answer to the rest of your comments, Aaron.
* You seem to have issues with the design of HDFS-6469, but did not present any
technical reasons there.
* You make HDFS-6469 a pre-condition for HADOOP-10641, but CNode implementation
cannot start without the CE interface.
* Committing this to a development branch wouldn't make sense without you being
convinced or comfortable to have it merged to trunk once the work is done.
* You do not give a clue on what would indicate a "general agreement" or what
would convince you that there is one.
We are hosting a [community meeting next
week|https://www.eventbrite.com/e/consensus-based-replication-in-hadoop-a-deep-dive-tickets-12158236613],
which was announce on the dev lists. The topics in the agenda include
technical discussion as well as the logistics of moving forward. Are you
available to talk about this issues at the meeting and potentially work out a
general agreement or a compromise?
> Introduce Coordination Engine
> -----------------------------
>
> Key: HADOOP-10641
> URL: https://issues.apache.org/jira/browse/HADOOP-10641
> Project: Hadoop Common
> Issue Type: New Feature
> Affects Versions: 3.0.0
> Reporter: Konstantin Shvachko
> Assignee: Plamen Jeliazkov
> Attachments: HADOOP-10641.patch, HADOOP-10641.patch,
> HADOOP-10641.patch, hadoop-coordination.patch
>
>
> Coordination Engine (CE) is a system, which allows to agree on a sequence of
> events in a distributed system. In order to be reliable CE should be
> distributed by itself.
> Coordination Engine can be based on different algorithms (paxos, raft, 2PC,
> zab) and have different implementations, depending on use cases, reliability,
> availability, and performance requirements.
> CE should have a common API, so that it could serve as a pluggable component
> in different projects. The immediate beneficiaries are HDFS (HDFS-6469) and
> HBase (HBASE-10909).
> First implementation is proposed to be based on ZooKeeper.
--
This message was sent by Atlassian JIRA
(v6.2#6252)