Jun Seok Hong created FLUME-2801:
------------------------------------

             Summary: Performance improvement on TailDir source
                 Key: FLUME-2801
                 URL: https://issues.apache.org/jira/browse/FLUME-2801
             Project: Flume
          Issue Type: Improvement
          Components: Sinks+Sources
    Affects Versions: v1.7.0
            Reporter: Jun Seok Hong
             Fix For: v1.7.0


This a proposal of performance implementation for new tailing source FLUME-2498.
Taildir source reads a file by 1byte, so the performance is very low compared 
to tailing on exec source.
I tested lot's of ways to improve performance and implemented the best one.

Changes.
* Reading a file by a 8k block instead of 1 byte.
* Use byte[] for handling data instead of 
ByteArrayDataOutput/ByteBuffer(direct)/.. for better performance.
* Don't convert byte[] to string and vice verse.

On simple file reading test results.
{quote}
 File size: 100 MB, 
 Line size: 500 byte

Estimated time to read the file:
|Reading 1byte(Using the code in Taildir)|32544 ms|
|Reading 8K Block|431 ms|
{quote}

Testing on flume, it catches up the performance of tailing on exec source. (30x 
performance boost)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to