This patch has serious technical flaws. If you want this functionality then you just need to set hdfs.maxOpenFiles = 1
However for typical use I would strongly recommend setting rollInterval = 300 and let it roll every 5 minutes. Regards, Mike On Fri, Oct 12, 2012 at 3:51 PM, Justin Workman <[email protected]>wrote: > I can confirm that we are seeing this issue as well. We are only using > rollSize and when time stamp indicated its time to create a new date > bucket. The path and new file are created however the existing file is > never closed and renamed. > > Applying this patch resolved the issue we were seeing and existing > files are closed now when the new one is opened. > > > > Sent from my iPhone > > On Oct 12, 2012, at 4:41 PM, "Mike Percy (JIRA)" <[email protected]> wrote: > > > > > [ > https://issues.apache.org/jira/browse/FLUME-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475413#comment-13475413] > > > > Mike Percy commented on FLUME-1350: > > ----------------------------------- > > > > That path means that any Event that goes to the HDFS sink must have a > header called "timestamp" which is a stringified Long value, typical Java > timestamp in milliseconds. The year-month-day will be generated from that > timestamp, and the event will be stored in a file under that directory. > > > > If there is already an open file in that directory, the event will be > appended to that file. If there is no open file in that directory, a new > file will be created. > > > > The only rules for closing a file are listed above, because when events > are collected from many hosts, there may be old events coming through at > the same time as new events, and we would not want to create too many small > files. So, the time to allow a file to remain open is configurable before > automatically closing it using rollInterval. > > > >> HDFS file handle not closed properly when date bucketing > >> --------------------------------------------------------- > >> > >> Key: FLUME-1350 > >> URL: https://issues.apache.org/jira/browse/FLUME-1350 > >> Project: Flume > >> Issue Type: Bug > >> Components: Sinks+Sources > >> Affects Versions: v1.1.0, v1.2.0 > >> Reporter: Robert Mroczkowski > >> Attachments: HDFSEventSink.java.patch > >> > >> > >> With configuration: > >> agent.sinks.hdfs-cafe-access.type = hdfs > >> agent.sinks.hdfs-cafe-access.hdfs.path = > hdfs://nga/nga/apache/access/%y-%m-%d/ > >> agent.sinks.hdfs-cafe-access.hdfs.fileType = DataStream > >> agent.sinks.hdfs-cafe-access.hdfs.filePrefix = cafe_access > >> agent.sinks.hdfs-cafe-access.hdfs.rollInterval = 21600 > >> agent.sinks.hdfs-cafe-access.hdfs.rollSize = 10485760 > >> agent.sinks.hdfs-cafe-access.hdfs.rollCount = 0 > >> agent.sinks.hdfs-cafe-access.hdfs.txnEventMax = 1000 > >> agent.sinks.hdfs-cafe-access.hdfs.batchSize = 1000 > >> #agent.sinks.hdfs-cafe-access.hdfs.codeC = snappy > >> agent.sinks.hdfs-cafe-access.hdfs.hdfs.maxOpenFiles = 5000 > >> agent.sinks.hdfs-cafe-access.channel = memo-1 > >> When new directory is created previous file handle remains opened. > rollInterval setting is used only with files in current date bucket. > > > > -- > > This message is automatically generated by JIRA. > > If you think it was sent incorrectly, please contact your JIRA > administrators > > For more information on JIRA, see: > http://www.atlassian.com/software/jira >
