[
https://issues.apache.org/jira/browse/FLUME-2502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171321#comment-14171321
]
Prateek Rungta commented on FLUME-2502:
---------------------------------------
[~hshreedharan] I'm not familiar with that API, but it looks like exactly
what's need in the long run. In the interim, can we apply this patch?
Without applying the patch, the Spool Source is unusable for large directories.
And this is exacerbated when using the BlobDeserializer.
> Spool source's directory listing is inefficient
> -----------------------------------------------
>
> Key: FLUME-2502
> URL: https://issues.apache.org/jira/browse/FLUME-2502
> Project: Flume
> Issue Type: Improvement
> Components: Sinks+Sources
> Affects Versions: v1.5.0
> Reporter: Prateek Rungta
> Attachments: FLUME-2502-0.patch
>
>
> As mentioned in
> [FLUME-2309|https://issues.apache.org/jira/browse/FLUME-2309], the directory
> listing can it self become the bottleneck when accessing directories with a
> large number of files (>1M). The fix in that JIRA added in the ability to
> specify `RANDOM` as a Consume-Order to avoid sorting large lists.
> The slowness of the directory listing is still un-addressed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)