[
https://issues.apache.org/jira/browse/FLUME-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13948980#comment-13948980
]
Muhammad Ehsan ul Haque commented on FLUME-2309:
------------------------------------------------
I think listing was not the problem, it was sorting the files.
The default order in the current implementation is oldest first (because of the
sort), and in the fix it is not changed, it is still oldest first. I just
improved it by not sorting which is of the order of O(N*logN) and just doing a
scan over all the files and picking the oldest one which is of the order of
O(N).
I can use the older code for listing instead of iterator if you want?
> Spooling directory should not always consume the oldest file first.
> -------------------------------------------------------------------
>
> Key: FLUME-2309
> URL: https://issues.apache.org/jira/browse/FLUME-2309
> Project: Flume
> Issue Type: New Feature
> Affects Versions: v1.4.0
> Reporter: Muhammad Ehsan ul Haque
> Priority: Minor
> Labels: feature, patch
> Fix For: v1.4.0
>
> Attachments: FLUME-2309-0.patch, FLUME-2309-0.patch
>
>
> The ReliableSpoolingFileEventReader reads the oldest file in the spooling
> directory first. This is done by listing the directory contents and then
> sorting file list based on timestamp. This may be very slow if there are a
> lot of files (of the order of 100K or more) in the directory.
> However, this is not always needed, there can be simple cases in which the
> order to consume the file is not important.
> There should be an option of consuming the files in arbitrary order, allowing
> the files to be consumed quickly without any delay.
--
This message was sent by Atlassian JIRA
(v6.2#6252)