John Thiltges created HDFS-13294:
------------------------------------
Summary: Flushing writes to disk with libhdfs
Key: HDFS-13294
URL: https://issues.apache.org/jira/browse/HDFS-13294
Project: Hadoop HDFS
Issue Type: Wish
Components: libhdfs
Reporter: John Thiltges
I'm working with an FTP server that writes into HDFS using libhdfs. I'd like to
ensure that incoming files are persisted on datanode disks before returning
success to clients. At present, power failures often mean lost blocks for
recent uploads.
The hsync() call and CreateFlag.SYNC_BLOCK open flags seem like the right
direction, but there doesn't appear to be a way to set SYNC_BLOCK with the
libhdfs interface. I believe hsync() only applies to the current block for a
filehandle.
Thoughts on implementing it:
# Use an existing 'close enough' fcntl flag to set SYNC_BLOCK?
Maybe O_DIRECT? Or O_SYNC or O_DSYNC
This would probably be the best, as it would keep the libhdfs interface the
same, and older versions would ignore the flags.
# Make hdfsOpenFile2 and have it accept HDFS flags (instead of fcntl flags)?
# Provide a method in DFSOutputStream to set shouldSyncBlock on an existing
stream, and a function in libhdfs to enable it?
For flushing writes with libhdfs right now (using CDH5), I'm guessing my only
option is to call hsync() after every 'block size' of writes, exactly on the
boundary.
Best regards,
John
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]