[
https://issues.apache.org/jira/browse/FLUME-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477751#comment-13477751
]
Juhani Connolly commented on FLUME-1350:
----------------------------------------
I don't see this patch as particularly harmful, and it does address a minor
niggle we also have.
We roll our files on the hour(so with an hdfs path something like %y%m%d/%h)
and have the roll time set to a bit over an hour, which does close the old
handles. But it feels like a kludge, and I think that modifying the design to
close old handles as the destination bucket changes is not unreasonable. I
think offering multiple rolling strategies, but then forcing people to add an
unrelated rollInterval to make sure they eventually get closed is not exactly
intuitive.
Can you think of a case where the old writer getting closed would be harmful?
With that being said, once a path stops receiving writes, I could see the
rollInterval still being necessary to close the final file.
> HDFS file handle not closed properly when date bucketing
> ---------------------------------------------------------
>
> Key: FLUME-1350
> URL: https://issues.apache.org/jira/browse/FLUME-1350
> Project: Flume
> Issue Type: Bug
> Components: Sinks+Sources
> Affects Versions: v1.1.0, v1.2.0
> Reporter: Robert Mroczkowski
> Attachments: HDFSEventSink.java.patch
>
>
> With configuration:
> agent.sinks.hdfs-cafe-access.type = hdfs
> agent.sinks.hdfs-cafe-access.hdfs.path =
> hdfs://nga/nga/apache/access/%y-%m-%d/
> agent.sinks.hdfs-cafe-access.hdfs.fileType = DataStream
> agent.sinks.hdfs-cafe-access.hdfs.filePrefix = cafe_access
> agent.sinks.hdfs-cafe-access.hdfs.rollInterval = 21600
> agent.sinks.hdfs-cafe-access.hdfs.rollSize = 10485760
> agent.sinks.hdfs-cafe-access.hdfs.rollCount = 0
> agent.sinks.hdfs-cafe-access.hdfs.txnEventMax = 1000
> agent.sinks.hdfs-cafe-access.hdfs.batchSize = 1000
> #agent.sinks.hdfs-cafe-access.hdfs.codeC = snappy
> agent.sinks.hdfs-cafe-access.hdfs.hdfs.maxOpenFiles = 5000
> agent.sinks.hdfs-cafe-access.channel = memo-1
> When new directory is created previous file handle remains opened.
> rollInterval setting is used only with files in current date bucket.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira