[ 
https://issues.apache.org/jira/browse/FLUME-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Muhammad Ehsan ul Haque updated FLUME-2309:
-------------------------------------------

    Comment: was deleted

(was: This patch provides.
* A consume order feature in the Spooling directory source, which will allow 
the users to explicitly state in which order; oldest (default), youngest or 
randomly files should be consumed from the spooling directory.
* Fixes the old implementation of selecting the file from spooling directory. 
Previously, each file to be consumed was selected by sorting, which might 
become extremly time consuming if there are many files (of the order of 10K or 
more). The new implementation instead do a linear scan in case when the consume 
order is oldest or youngest.
* Updates the Flume user guide accordingly.)

> Spooling directory should not always consume the oldest file first.
> -------------------------------------------------------------------
>
>                 Key: FLUME-2309
>                 URL: https://issues.apache.org/jira/browse/FLUME-2309
>             Project: Flume
>          Issue Type: New Feature
>    Affects Versions: v1.4.0
>            Reporter: Muhammad Ehsan ul Haque
>            Priority: Minor
>              Labels: feature, patch
>             Fix For: v1.4.0
>
>         Attachments: FLUME-2309-0.patch
>
>
> The ReliableSpoolingFileEventReader reads the oldest file in the spooling 
> directory first. This is done by listing the directory contents and then 
> sorting file list based on timestamp. This may be very slow if there are a 
> lot of files (of the order of 100K or more) in the directory.
> However, this is not always needed, there can be simple cases in which the 
> order to consume the file is not important.
> There should be an option of consuming the files in arbitrary order, allowing 
> the files to be consumed quickly without any delay.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to