[
https://issues.apache.org/jira/browse/HADOOP-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13625721#comment-13625721
]
bradley childs commented on HADOOP-9371:
----------------------------------------
Great work here guys. I've been researching the semantics around write locking
and have a couple comments. First around this line regarding write atomicity:
"Only one writer can write to a file (ISSUE: does anything in MR/HBase use this
for locks?)", which implies fully atomic write transactions.
If this line is a MUST (slightly unclear) then the file lock/release would have
to be explicit around create(), append(), and open(). Any writer would have to
go through a lock/release state for the file during the output stream
instantiation (not desirable).
If you looked at HDFS' DistributedFileSystem.java (linked below)
create/open/append methods, a FSDataOutputStream is returned with no locking or
lifecycle. Further investigation show's no explicit locking inside the
FSDataOutputStream stream class.
Instead, the FSDataOutputStream does implement the o.a.h.fs.Syncable class
which provides a sync() method. Per the interface a call to the sync method
"Synchronize[s] all buffer with the underlying devices."
To me this says that there is no exclusive Writers. Instead a Writers file
consistency is only guaranteed the instant the sync(...) method is called on
the underlying OutputStream, after which it only MAY be consistent until the
sync(..) method is called again.
Summary: "I believe Only one writer can write to a file (ISSUE: does anything
in MR/HBase use this for locks?)" should be changed to something like "A file
may have multiple writers with each writers only guarantee on consistency is
during a sync(...) call."
Ref:
https://svn.apache.org/repos/asf/hadoop/common/tags/release-1.0.4/src/hdfs/org/apache/hadoop/hdfs/DistributedFileSystem.java
https://svn.apache.org/repos/asf/hadoop/common/tags/release-1.0.4/src/core/org/apache/hadoop/fs/FSDataOutputStream.java
https://svn.apache.org/repos/asf/hadoop/common/tags/release-1.0.4/src/core/org/apache/hadoop/fs/Syncable.java
> Define Semantics of FileSystem and FileContext more rigorously
> --------------------------------------------------------------
>
> Key: HADOOP-9371
> URL: https://issues.apache.org/jira/browse/HADOOP-9371
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs
> Affects Versions: 1.2.0, 3.0.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Attachments: HADOOP-9361.2.patch, HADOOP-9361.patch,
> HadoopFilesystemContract.pdf
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> The semantics of {{FileSystem}} and {{FileContext}} are not completely
> defined in terms of
> # core expectations of a filesystem
> # consistency requirements.
> # concurrency requirements.
> # minimum scale limits
> Furthermore, methods are not defined strictly enough in terms of their
> outcomes and failure modes.
> The requirements and method semantics should be defined more strictly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira