[jira] [Commented] (HADOOP-10641) Introduce Coordination Engine interface

Steve Loughran (JIRA) Thu, 24 Jul 2014 03:49:26 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-10641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14073075#comment-14073075
 ]


Steve Loughran commented on HADOOP-10641:
-----------------------------------------

bq. this jira is not proposing new Consensus protocols, as stated in this 
comment. CoordinationEngine here is an interface to be used with existing 
consensus algorithms, 

Exactly. This JIRA is proposing a plugin interface to co-ordination systems 
using consensus algorithms, a plugin point intended for use by HDFS and others. 
It is absolutely critical that all implementations of this plug in do exactly 
what is expected of them -and we cannot do that without a clear definition of 
what they are meant to do, what guarantees must be met and what failure modes 
are expected. 

The consensus node design document is not such a document. It's an outline of 
what can be done, but it doesn't specify the API. The current patch for this 
JIRA contains some interfaces, a ZK class and a single test case. Can we trust 
this ZK class to do what is required? Not without a clear definition of what is 
required. Can we trust the test case to verify that the ZK implementations does 
what is required? Not now, no. What do we do if there is a difference between 
what the ZK implementation does and the interface defines -is it the interface 
at fault, or the ZK implementation? What if a third-party implementation does 
something differently? Whose implementation is considered the correct one?

For the filesystems, HDFS defines the behavior; my '9361 JIRA was deriving a 
specification from that implementation, generating more corner case tests, and 
making the details of how (every) other filesystem behaves differently a 
declarative bit of XML for each FS -now we can see how they differ. We've even 
used it to bring the other filesystems (especially S3N) more in line with what 
is expected.

This new plugin point is intended become a critical failure point for HDFS and 
YARN, where the incorrect behaviour of an implementations potentially places 
data at risk. Yet to date, all we have is a PDF file which, as Amazon describes 
it "conventional design documents consist of prose, static diagrams, and 
perhaps pseudo-code in an ad hoc untestable language."

This is not a full consensus protocol; it will be straightforward to specify 
strictly enough to derive tests, to tell implementors of consensus 
protocol-based systems how to hook up their work to Hadoop. And, as those 
implementors are expected to be experts in distributed systems and such topics, 
we should be able to expect them to pick up basic specification languages just 
as we expect submitters of all patches to be able to write JUnit tests.


> Introduce Coordination Engine interface
> ---------------------------------------
>
>                 Key: HADOOP-10641
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10641
>             Project: Hadoop Common
>          Issue Type: New Feature
>    Affects Versions: 3.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Plamen Jeliazkov
>         Attachments: HADOOP-10641.patch, HADOOP-10641.patch, 
> HADOOP-10641.patch, hadoop-coordination.patch
>
>
> Coordination Engine (CE) is a system, which allows to agree on a sequence of 
> events in a distributed system. In order to be reliable CE should be 
> distributed by itself.
> Coordination Engine can be based on different algorithms (paxos, raft, 2PC, 
> zab) and have different implementations, depending on use cases, reliability, 
> availability, and performance requirements.
> CE should have a common API, so that it could serve as a pluggable component 
> in different projects. The immediate beneficiaries are HDFS (HDFS-6469) and 
> HBase (HBASE-10909).
> First implementation is proposed to be based on ZooKeeper.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10641) Introduce Coordination Engine interface

Reply via email to