[ 
https://issues.apache.org/jira/browse/HADOOP-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13625721#comment-13625721
 ] 

bradley childs commented on HADOOP-9371:
----------------------------------------

Great work here guys.  I've been researching the semantics around write locking 
and have a couple comments.  First around this line regarding write atomicity:

"Only one writer can write to a file (ISSUE: does anything in MR/HBase use this 
for locks?)", which implies fully atomic write transactions. 

If this line is a MUST (slightly unclear) then the file lock/release would have 
to be explicit around create(), append(), and open().  Any writer would have to 
go through a lock/release state for the file during the output stream 
instantiation (not desirable).  

If you looked at HDFS' DistributedFileSystem.java (linked below) 
create/open/append methods, a FSDataOutputStream is returned with no locking or 
lifecycle.  Further investigation show's no explicit locking inside the 
FSDataOutputStream stream class.

Instead, the FSDataOutputStream does implement the o.a.h.fs.Syncable class 
which provides a sync() method. Per the interface a call to the sync method 
"Synchronize[s] all buffer with the underlying devices."

To me this says that there is no exclusive Writers.  Instead a Writers file 
consistency is only guaranteed the instant the sync(...) method is called on 
the underlying OutputStream, after which it only MAY be consistent until the 
sync(..) method is called again. 

Summary:  "I believe Only one writer can write to a file (ISSUE: does anything 
in MR/HBase use this for locks?)" should be changed to something like  "A file 
may have multiple writers with each writers only guarantee on consistency is 
during a sync(...) call."

Ref:
https://svn.apache.org/repos/asf/hadoop/common/tags/release-1.0.4/src/hdfs/org/apache/hadoop/hdfs/DistributedFileSystem.java
https://svn.apache.org/repos/asf/hadoop/common/tags/release-1.0.4/src/core/org/apache/hadoop/fs/FSDataOutputStream.java
https://svn.apache.org/repos/asf/hadoop/common/tags/release-1.0.4/src/core/org/apache/hadoop/fs/Syncable.java

                
> Define Semantics of FileSystem and FileContext more rigorously
> --------------------------------------------------------------
>
>                 Key: HADOOP-9371
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9371
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs
>    Affects Versions: 1.2.0, 3.0.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-9361.2.patch, HADOOP-9361.patch, 
> HadoopFilesystemContract.pdf
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> The semantics of {{FileSystem}} and {{FileContext}} are not completely 
> defined in terms of 
> # core expectations of a filesystem
> # consistency requirements.
> # concurrency requirements.
> # minimum scale limits
> Furthermore, methods are not defined strictly enough in terms of their 
> outcomes and failure modes.
> The requirements and method semantics should be defined more strictly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to