[jira] [Commented] (HADOOP-10641) Introduce Coordination Engine

Steve Loughran (JIRA) Mon, 21 Jul 2014 10:17:00 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-10641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14068815#comment-14068815
 ]


Steve Loughran commented on HADOOP-10641:
-----------------------------------------

I'm very much an =0 to the changes to HDFS, as that level is not an area of my 
understanding. If something does go into HDFS, then as noted, hadoop-common 
does seem an appropriate location - if it can't go into hadoop-hdfs itself.

Before that happens, consider this:

Consensus protocols are where CS-hard mathematics comes out of the textbooks 
and into the codebase; it is a key place where you are expected to prove the 
correctness of your algorithm before your peers will trust it. And, hopefully, 
before you make the correctness of that algorithm a critical part of your own 
application.

If Hadoop is going to provide a plug-in point for distributed co-ordination 
systems -- which is what this proposal is -- then we need to specify what is 
expected of an implementation strictly enough that it is possible to prove that 
implementations meet the specification, and that downstream projects can 
demonstrate that if an implementation meets this specification then their own 
algorithms with be correct.

More succinctly: *I want a formal specification of the API, and what we have in 
the current PDF design document is not it. I will also need evidence that the 
reference ZK implementation is consistent with that specification, both by any 
maths that can be provided, and the test cases derived from the specification. 


This may seem a harsh requirement, but HADOOP-9361 shows that it is nothing I 
would not impose on myself. It is [What Amazon is doing in their 
stack|http://perspectives.mvdirona.com/CommentView,guid,5638e46e-eb91-4dbf-9f75-351afcb7a199.aspx],
 and it has also been done for [Distributed File 
Systems|https://birrell.org/andrew/papers/FileSysSpec-DCCS.pdf].

I would recommend using TLA+ here -and for any downstream uses. Once the 
foundations are done, then we can move onto YARN, and then finally to the 
applications which run on it.


I'm not going to comment on the code at all at this point, except to observe 
that you should be making this a YARN service to integrate with the rest of the 
services and workflow being built around them. The core classes are in 
hadoop-common.

> Introduce Coordination Engine
> -----------------------------
>
>                 Key: HADOOP-10641
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10641
>             Project: Hadoop Common
>          Issue Type: New Feature
>    Affects Versions: 3.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Plamen Jeliazkov
>         Attachments: HADOOP-10641.patch, HADOOP-10641.patch, 
> HADOOP-10641.patch, hadoop-coordination.patch
>
>
> Coordination Engine (CE) is a system, which allows to agree on a sequence of 
> events in a distributed system. In order to be reliable CE should be 
> distributed by itself.
> Coordination Engine can be based on different algorithms (paxos, raft, 2PC, 
> zab) and have different implementations, depending on use cases, reliability, 
> availability, and performance requirements.
> CE should have a common API, so that it could serve as a pluggable component 
> in different projects. The immediate beneficiaries are HDFS (HDFS-6469) and 
> HBase (HBASE-10909).
> First implementation is proposed to be based on ZooKeeper.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10641) Introduce Coordination Engine

Reply via email to