[jira] [Commented] (HADOOP-13327) Add OutputStream + Syncable to the Filesystem Specification

Steve Loughran (JIRA) Tue, 29 Aug 2017 04:12:46 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-13327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16145119#comment-16145119
 ]


Steve Loughran commented on HADOOP-13327:
-----------------------------------------

Thanks, I'll look at these. I thought I'd made the new close util atomic by way 
of AtomicBoolean, but perhaps not

b1q. L53-142 - Generally speaking, I think the message is better delivered thru 
clear descriptions. The notation is somewhat ambiguous to me.

Ah, you've entered the world of "how do we define the semantics of our FS API 
in a way which can be used for implmentors, users and people writing tests". 
Which is where I must respecfully disagree


h3. Notation 

The notation [is essentially 
python|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/notation.md]
 working on the state of the inputs, S and, if the conditions are met producing 
a new state, S', describing the updated state of the system. 

Why Python? It lets us call "functions" maps, and people are happy with new 
items being added to maps, and concepts like "kesy" and "values" the way they 
wouldn't be if terms like "domain" and "range" of a function. It also lets us 
use Python list selection to work on the domain and range of functions the way 
we could otherwise do in set-theory notation with upside-down As, backward Es 
and the like . We want something to be broadly used, rather to scare people 
away.

h3. Model 

The [Core FS 
model|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/model.md].
 It's incomplete, probably got bugs in, but it deliberately tries to describe a 
filesystem at its most abstract: a list of path elements mapping to files or 
directories; directories can have children but no data, files can have data but 
no children, delete makes things go away, rename moves stuff, etc. The output 
stream is essentially the rounding off the earlier work with the bit we'd 
always avoid.

If we were going to change the spec, I'd rather go the other way, to something 
more rigorous, specifically TLA+ specs: more standard, capable of modelling 
concurrency requirements, the TLA+ toolkit can validate specs, [in use 
elsewhere|http://lamport.azurewebsites.net/tla/formal-methods-amazon.pdf]. The 
problem here is that when I do put up a bit of TLA+, such as for [a consistent 
blobstore|https://issues.apache.org/jira/secure/attachment/12865161/objectstore.pdf]
 nobody every finds bugs in it. Which means that people aren't reading it that 
rigorously. 

I know we could do with strictness in what is essentially the core API 
definitions for what is the open source stack for some of the largest 
applications in the world, but really, short term, my main concern is trying to 
make sure the HDFS team actually keep what we have up to date with public 
methods and interfaces they add, so that more users than one or two know to 
call it,  other FS implementations know what to do (Hello CanUnbuffer!) 
(HADOOP-14748, HADOOP-14747,...), and we have some tests which actually test 
which tries to break those implementations.

So no, not going to change. Sorry

> Add OutputStream + Syncable to the Filesystem Specification
> -----------------------------------------------------------
>
>                 Key: HADOOP-13327
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13327
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs
>    Affects Versions: 2.8.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13327-002.patch, HADOOP-13327-branch-2-001.patch
>
>
> Write down what a Filesystem output stream should do. While core the API is 
> defined in Java, that doesn't say what's expected about visibility, 
> durability, etc —and Hadoop Syncable interface is entirely ours to define.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-13327) Add OutputStream + Syncable to the Filesystem Specification

Reply via email to