[ 
https://issues.apache.org/jira/browse/HADOOP-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12601922#action_12601922
 ] 

Tom White commented on HADOOP-3177:
-----------------------------------

bq. I think it should be java.io.FileOutputStream since we are doing FileSystem.

But FileOutputStream is tied to Java's File abstraction which isn't general 
enough for Hadoop FileSystems. Furthermore FileOutputStream#getFD is final, as 
is FileDescriptor, so we can't use it here.

How about an interface:

{code}
public interface Syncable {
  void sync() throws IOException;
}
{code}

(Or should it be "Synchable"?) Then make DFSOutputStream implement Syncable, so 
FSDataOutputStream - which is also a Syncable - can see if it can call sync() 
on the underlying stream.

What are the semantics of sync()? I think the expectation is that after sync 
returns the system has successfully sync'ed buffers to disk. So if this is not 
true, sync() should throw an exception. This is what java.io.FileDescriptor 
does. Using a subclass of IOException (java.io.SyncFailedException?) would make 
this easier for callers. I realize that this description is at odds with the 
current contract for DFSOutputStream#fsync, which doesn't guarantee that the 
data has been flushed to persistent storage, but I wondered whether 
DFSOutputStream could be strengthened to make this guarantee? 

If the FileSystem doesn't support sync then do we get an exception when calling 
sync(), or is it a no op?

> Expose DFSOutputStream.fsync API though the FileSystem interface
> ----------------------------------------------------------------
>
>                 Key: HADOOP-3177
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3177
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>
> In the current code, there is a DFSOutputStream.fsync() API that allows a 
> client to flush all buffered data to the datanodes and also persist block 
> locations on the namenode. This API should be exposed through the generic API 
> in the org.hadoop.fs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to