[ 
https://issues.apache.org/jira/browse/HADOOP-10641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14063127#comment-14063127
 ] 

Plamen Jeliazkov commented on HADOOP-10641:
-------------------------------------------

We hosted a meet-up at the WANdisco office in San Ramon today. Thank you to 
everyone who came. I'd especially like to thank [~atm] and [~sanjay.radia] for 
taking their time to connect with us.

I took the liberty to record some of the comments / concerns people raised 
during our meet-up. I will list all of them here and provide a few responses.

* Is NoQuorumException and ProposalNotAcceptedException enough? Are there other 
exceptions CoordinationEngine might throw?
** My own feeling is that these two in particular were the most general and 
universal. We could always add IOException, if desired.  

* In submitProposal() there is ProposalReturnCode return value and possible 
Exception to be thrown. It is unclear which one we should use.
** I agree. Konstantin looked at me for an answer during this but I remained 
silent. The reason for this is for ProposalReturnCode to return a deterministic 
result (NoQuorum has a deterministic event; the Proposal was not sent), and to 
treat the Exception case as something wrong with the Proposal itself (i.e., 
doesn't implement equal() or hashcode() correctly, or cannot be serialized 
properly). I understand the confusion and we could do better with just the 
Exception case.

* ConsensusNode is non-specific. Consider renaming the project to 
ConsensusNameNode.
** This applies to HDFS-6469. I think ConsensusNameNode is a good name. I'll 
probably always continue to call them CNodes though. :)

* Concern for PAXOS to effectively load balance clients. Two round trips makes 
writes slow.

* CNodeProxyProvider should allow for deterministic host selection. Consider a 
round-robin approach.

* We are weakening read semantics to provide the fast read path. This makes 
stale reads possible.
** Konstantin discussed the 'coordinated read' mechanism and how we ensure 
clients talk to up-to-date NameNodes via Proposals.

* Sub-namespace WAN replication is highly desirable but double-journaling in 
the CoordinationEngine and the EditsLog is concerning.

* An address of the impact on write performance is desirable by the community.

* HBase coming up with WAL plugin for possible coordination. Wary of membership 
coordination (multiple Distributed State Machines) for HBase WALs.

* Small separate project might make it more likely for people to import CE into 
their own projects and build their own CoordinationEngines. Separate branch 
also possible.

Some of these clearly correspond to the HDFS and HBase projects and not just 
the CoordinationEngine itself. Apologies if I missed anyone's concern / point; 
pretty sure I captured everybody though.

> Introduce Coordination Engine
> -----------------------------
>
>                 Key: HADOOP-10641
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10641
>             Project: Hadoop Common
>          Issue Type: New Feature
>    Affects Versions: 3.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Plamen Jeliazkov
>         Attachments: HADOOP-10641.patch, HADOOP-10641.patch, 
> HADOOP-10641.patch, hadoop-coordination.patch
>
>
> Coordination Engine (CE) is a system, which allows to agree on a sequence of 
> events in a distributed system. In order to be reliable CE should be 
> distributed by itself.
> Coordination Engine can be based on different algorithms (paxos, raft, 2PC, 
> zab) and have different implementations, depending on use cases, reliability, 
> availability, and performance requirements.
> CE should have a common API, so that it could serve as a pluggable component 
> in different projects. The immediate beneficiaries are HDFS (HDFS-6469) and 
> HBase (HBASE-10909).
> First implementation is proposed to be based on ZooKeeper.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to