Illes S created FLUME-2900:
------------------------------
Summary: Allow triggering hsync for HDFS sink during write
Key: FLUME-2900
URL: https://issues.apache.org/jira/browse/FLUME-2900
Project: Flume
Issue Type: Wish
Components: Sinks+Sources
Reporter: Illes S
Priority: Minor
HDFS sink calls {{hflush()}} (or {{sync()}}) on the {{FSDataOutputStream}}
which will flush client buffers, but will not update the output file size on
the NameNode (see HDFS-5478) while it is being written, only after it is closed.
It would be nice to allow users to trigger updating the file length (which also
syncs file data to disk, see HDFS-4213):
{{((HdfsDataOutputStream) fos).hsync(EnumSet.of(SyncFlag.UPDATE_LENGTH));}}
This could be done via new {{hdfs.hsyncInterval}}, {{hdfs.hsyncSize}} and
{{hdfs.hsyncCount}} configuration options.
A workaround is to roll the output file more often, but that leads to many
small files which may be worse than putting extra load on the NameNode by
calling {{hsync(...)}} multiple times during write, right?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)