Hi Justin,
Please can you post your agent config and also any HDFS logs? Ideally you 
should be seeing INFO logs as follows: “Closing Idle Bucketwriter”.

Tristan

Tristan Stevens
Senior Solutions Architect
Cloudera, Inc. | www.cloudera.com
m +44(0)7808 986422 | tris...@cloudera.com

        Celebrating a decade of community accomplishments
cloudera.com/hadoop10
#hadoop10

On 12 January 2017 at 19:23:18, Justin Workman (justinjwork...@gmail.com) wrote:

More details  

Flume 1.6 - Core Apache version.  
KafkaSource (0.8.2) -> File Channel -> HDFS Sink (CDH5.5.2).  

On Thu, Jan 12, 2017 at 12:20 PM, Justin Workman <justinjwork...@gmail.com>  
wrote:  

> sorry for cross posting to user and dev. I have recently set up a flume  
> configuration where we are using the regex_extractor interceptor to parse  
> the actual event date from the record flowing through the Flume source,  
> then using that date to build the HDFS sink bucket path. However, it  
> appears that the hdfs.idleTimeout value is not honored in this  
> configuration. It does work when using the timestamp interceptor you build  
> the output path.  
>  
> I have set the hdfs.idleTimeout value for the HDFS sink, but the files are  
> never closed or renamed until I restart or shutdown Flume. Our flume is  
> configured to roll based on size or output path, and the files  
> rename/close/roll fine based on size, however the last file in each output  
> path is always left with the .tmp extension until we restart Flume. I would  
> expect that the file would be renamed and closed if there are no records  
> written to this file after the idleTimeout is reached.  
>  
> Could I be missing something, or is this a known bug with the  
> regex_extract interceptor?  
>  
> Thanks  
> Justin  
>  

Reply via email to