[jira] [Commented] (FLUME-2918) TaildirSource is underperforming with huge parent directories

Attila Simon (JIRA) Tue, 31 May 2016 22:58:57 -0700

    [ 
https://issues.apache.org/jira/browse/FLUME-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15309339#comment-15309339
 ]


Attila Simon commented on FLUME-2918:
-------------------------------------

Comparing how could the same functionality be implemented clarified that using 
java.nio.file.DirectoryStream to list the files gives the best overall 
performance (only very first invocation has a JIT overhead when it performs 
little bit worse than the proper FileFilter). Please see attachments.
 - PerfHugeDir.java generated the execution times
 - test.csv captured result of executing PerfHugeDir.main() 
 - perftest.png charted version of the csv data (execution time in millisecs 
comparing the different implementations)
I started with a directory of 59k files, only a single file matched the 
pattern, there were couple of subdirs. After ~230 rounds I started massively 
removing the files not matched by the pattern and reduced the number to ~20 
files all together within the parent dir which reduction was responsible for 
the fade out. (Secondly I ran the same test starting with empty dir and adding 
300files/sec to 59k that was also won by DirectoryStream. No attachment for 
this.)

> TaildirSource is underperforming with huge parent directories
> -------------------------------------------------------------
>
>                 Key: FLUME-2918
>                 URL: https://issues.apache.org/jira/browse/FLUME-2918
>             Project: Flume
>          Issue Type: Improvement
>          Components: Sinks+Sources
>            Reporter: Attila Simon
>              Labels: performance
>             Fix For: v1.7.0
>
>         Attachments: profiling_after.png, profiling_before.png
>
>
> TailDir source cause high cpu utilization, when large amount of file is 
> sitting in the target directory. File pattern matches only a single file, but 
> the parent directory contains about 50,000 other file. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLUME-2918) TaildirSource is underperforming with huge parent directories

Reply via email to