[ 
https://issues.apache.org/jira/browse/FLUME-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13888862#comment-13888862
 ] 

Hari Shreedharan commented on FLUME-2309:
-----------------------------------------

In case there are millions of files, the directory listing itself would take a 
lot of time. I don't know if there is anything much we can do in that case. 
Also, like you said keeping a sorted buffer is not a good idea when there are 
so many files. Do you see another way out?

Unfortunately, we cannot use a directory stream since we still need to support 
Java 6. 

Please go ahead and work on this. 

> Spooling directory should not always consume the oldest file first.
> -------------------------------------------------------------------
>
>                 Key: FLUME-2309
>                 URL: https://issues.apache.org/jira/browse/FLUME-2309
>             Project: Flume
>          Issue Type: New Feature
>            Reporter: Muhammad Ehsan ul Haque
>            Priority: Minor
>
> The ReliableSpoolingFileEventReader reads the oldest file in the spooling 
> directory first. This is done by listing the directory contents and then 
> sorting file list based on timestamp. This may be very slow if there are a 
> lot of files (of the order of 100K or more) in the directory.
> However, this is not always needed, there can be simple cases in which the 
> order to consume the file is not important.
> There should be an option of consuming the files in arbitrary order, allowing 
> the files to be consumed quickly without any delay.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to