[ 
https://issues.apache.org/jira/browse/FLUME-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475413#comment-13475413
 ] 

Mike Percy commented on FLUME-1350:
-----------------------------------

That path means that any Event that goes to the HDFS sink must have a header 
called "timestamp" which is a stringified Long value, typical Java timestamp in 
milliseconds. The year-month-day will be generated from that timestamp, and the 
event will be stored in a file under that directory.

If there is already an open file in that directory, the event will be appended 
to that file. If there is no open file in that directory, a new file will be 
created.

The only rules for closing a file are listed above, because when events are 
collected from many hosts, there may be old events coming through at the same 
time as new events, and we would not want to create too many small files. So, 
the time to allow a file to remain open is configurable before automatically 
closing it using rollInterval.
                
> HDFS file handle not closed properly when date bucketing 
> ---------------------------------------------------------
>
>                 Key: FLUME-1350
>                 URL: https://issues.apache.org/jira/browse/FLUME-1350
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v1.1.0, v1.2.0
>            Reporter: Robert Mroczkowski
>         Attachments: HDFSEventSink.java.patch
>
>
> With configuration:
> agent.sinks.hdfs-cafe-access.type = hdfs
> agent.sinks.hdfs-cafe-access.hdfs.path =  
> hdfs://nga/nga/apache/access/%y-%m-%d/
> agent.sinks.hdfs-cafe-access.hdfs.fileType = DataStream
> agent.sinks.hdfs-cafe-access.hdfs.filePrefix = cafe_access
> agent.sinks.hdfs-cafe-access.hdfs.rollInterval = 21600
> agent.sinks.hdfs-cafe-access.hdfs.rollSize = 10485760
> agent.sinks.hdfs-cafe-access.hdfs.rollCount = 0
> agent.sinks.hdfs-cafe-access.hdfs.txnEventMax = 1000
> agent.sinks.hdfs-cafe-access.hdfs.batchSize = 1000
> #agent.sinks.hdfs-cafe-access.hdfs.codeC = snappy
> agent.sinks.hdfs-cafe-access.hdfs.hdfs.maxOpenFiles = 5000
> agent.sinks.hdfs-cafe-access.channel = memo-1
> When new directory is created previous file handle remains opened. 
> rollInterval setting is used only with files in current date bucket. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to