[
https://issues.apache.org/jira/browse/HADOOP-13327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16145119#comment-16145119
]
Steve Loughran commented on HADOOP-13327:
-----------------------------------------
Thanks, I'll look at these. I thought I'd made the new close util atomic by way
of AtomicBoolean, but perhaps not
b1q. L53-142 - Generally speaking, I think the message is better delivered thru
clear descriptions. The notation is somewhat ambiguous to me.
Ah, you've entered the world of "how do we define the semantics of our FS API
in a way which can be used for implmentors, users and people writing tests".
Which is where I must respecfully disagree
h3. Notation
The notation [is essentially
python|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/notation.md]
working on the state of the inputs, S and, if the conditions are met producing
a new state, S', describing the updated state of the system.
Why Python? It lets us call "functions" maps, and people are happy with new
items being added to maps, and concepts like "kesy" and "values" the way they
wouldn't be if terms like "domain" and "range" of a function. It also lets us
use Python list selection to work on the domain and range of functions the way
we could otherwise do in set-theory notation with upside-down As, backward Es
and the like . We want something to be broadly used, rather to scare people
away.
h3. Model
The [Core FS
model|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/model.md].
It's incomplete, probably got bugs in, but it deliberately tries to describe a
filesystem at its most abstract: a list of path elements mapping to files or
directories; directories can have children but no data, files can have data but
no children, delete makes things go away, rename moves stuff, etc. The output
stream is essentially the rounding off the earlier work with the bit we'd
always avoid.
If we were going to change the spec, I'd rather go the other way, to something
more rigorous, specifically TLA+ specs: more standard, capable of modelling
concurrency requirements, the TLA+ toolkit can validate specs, [in use
elsewhere|http://lamport.azurewebsites.net/tla/formal-methods-amazon.pdf]. The
problem here is that when I do put up a bit of TLA+, such as for [a consistent
blobstore|https://issues.apache.org/jira/secure/attachment/12865161/objectstore.pdf]
nobody every finds bugs in it. Which means that people aren't reading it that
rigorously.
I know we could do with strictness in what is essentially the core API
definitions for what is the open source stack for some of the largest
applications in the world, but really, short term, my main concern is trying to
make sure the HDFS team actually keep what we have up to date with public
methods and interfaces they add, so that more users than one or two know to
call it, other FS implementations know what to do (Hello CanUnbuffer!)
(HADOOP-14748, HADOOP-14747,...), and we have some tests which actually test
which tries to break those implementations.
So no, not going to change. Sorry
> Add OutputStream + Syncable to the Filesystem Specification
> -----------------------------------------------------------
>
> Key: HADOOP-13327
> URL: https://issues.apache.org/jira/browse/HADOOP-13327
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs
> Affects Versions: 2.8.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Attachments: HADOOP-13327-002.patch, HADOOP-13327-branch-2-001.patch
>
>
> Write down what a Filesystem output stream should do. While core the API is
> defined in Java, that doesn't say what's expected about visibility,
> durability, etc —and Hadoop Syncable interface is entirely ours to define.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]