[ 
https://issues.apache.org/jira/browse/HADOOP-10641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080800#comment-14080800
 ] 

Steve Loughran commented on HADOOP-10641:
-----------------------------------------

bq. It seems that you cite HADOOP-9361 as an example of a specification of an 
API. Same as Andrew I could not find specifications or any documents linked 
there. May be you can clarify. Is it a TLA+ spec of HDFS?

Look in [the site 
docs|https://github.com/apache/hadoop-common/tree/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem].

I actually used the Z model with a python syntax in the hope it would be more 
broadly understood, though now that I'm trying to integrate it with other work 
I'm trying to think "shall I go back and TLA+ it?"...because then its possible 
to start thinking about code using the public module definitions.

bq. If FileSystem specs are done and the tests developed according to specs, 
should that be sufficient to safeguard FS behaviour of any internal changes 
including introduction of CE? Which I assume is your main concern with this 
jira.

My concern is slightly different: we're producing a plugin point for 
co-ordination across the Hadoop stack, and we need to know what it is meant to 
do.

bq. If I introduce an implementation of CE using ZK and it does not break the 
tests and therefore does not alter FileSystem semantics, isn't that a 
verification of the implementation.

The real HDFS tests are the full stack, with failure injection...those FS API 
ones are limited to those API calls and how things like {{seek()}} work. It'll 
be the full stack tests with failure injection that will highlight where a test 
fails after the changes.

As for the tests in this JIRA so far, they're pretty minimal and verify that 
the ZK implementation does "something" provided nothing appears to fail. They 
don't seem designed to be run against any other implementation of the 
interface, nor is there any failure injection. Even without the fault 
injection, any tests for this should be designed to be targetable at any 
implementation, to show consistent behaviour in the core actions. 

Such tests won't verify robustness though...it looks to me that I could 
implement this API using in-memory data structures, something would be utterly 
lacking in the durability things need. Or I could try to use Gossip, which may 
have the durability, but a different ordering guarantee which may show up in 
production. Its things like the latter I'm hoping to catch, by spelling out to 
implementors what they have to do.

bq. It looks like you propose to introduce a new requirement for Hadoop 
contributions, that new features should be formally specified and a 
mathematically proven. I think this should be discussed in a separate thread 
before it can be enforced. Would be good to hear what people think.

Proven? Not a chance. What I would like to see is those critical co-ordination 
points to be defined formally enough that there's no ambiguity about what they 
can do. This proposal is for a plugin to define what HDFS, HBase &c will expect 
from a consensus service, so we have to ensure there's no ambiguity. Then we 
can use those documents to derive the tests to break things.

bq. Generally great minds were thinking about disciplined software development 
techniques way back, probably starting from Dijkstra and Donald Knuth. I found 
them very useful dealing with complex algorithms, not sure about APIs.

+Parnas, who invented "interfaces" as the combination of {{(signature, 
semantics)}} in 1972. What I'm trying to do here is get those semantics nailed 
down.


> Introduce Coordination Engine interface
> ---------------------------------------
>
>                 Key: HADOOP-10641
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10641
>             Project: Hadoop Common
>          Issue Type: New Feature
>    Affects Versions: 3.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Plamen Jeliazkov
>         Attachments: HADOOP-10641.patch, HADOOP-10641.patch, 
> HADOOP-10641.patch, hadoop-coordination.patch
>
>
> Coordination Engine (CE) is a system, which allows to agree on a sequence of 
> events in a distributed system. In order to be reliable CE should be 
> distributed by itself.
> Coordination Engine can be based on different algorithms (paxos, raft, 2PC, 
> zab) and have different implementations, depending on use cases, reliability, 
> availability, and performance requirements.
> CE should have a common API, so that it could serve as a pluggable component 
> in different projects. The immediate beneficiaries are HDFS (HDFS-6469) and 
> HBase (HBASE-10909).
> First implementation is proposed to be based on ZooKeeper.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to