[
https://issues.apache.org/jira/browse/FLUME-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hari Shreedharan updated FLUME-2155:
------------------------------------
Attachment: FLUME-2155-initial.patch
Initial working patch. There are several fixes I want to implement (like
integrating this into fast replay and ensuring it does not run when it is a
full replay).
I also need to add several tests to make sure things work as intended. With the
current patch, all current tests pass - and the batch puts and removes are what
are being invoked as that is enabled as default config (and also from running
it through a debugger).
> Improve replay time
> -------------------
>
> Key: FLUME-2155
> URL: https://issues.apache.org/jira/browse/FLUME-2155
> Project: Flume
> Issue Type: Bug
> Reporter: Hari Shreedharan
> Assignee: Hari Shreedharan
> Attachments: 100000-110000, 10000-20000, 300000-310000,
> 700000-710000, fc-test.patch, FLUME-2155-initial.patch, SmartReplay1.1.pdf,
> SmartReplay.pdf
>
>
> File Channel has scaled so well that people now run channels with sizes in
> 100's of millions of events. Turns out, replay can be crazy slow even between
> checkpoints at this scale - because of the remove() method in FlumeEventQueue
> moving every pointer that follows the one being removed (1 remove causes 99
> million+ moves for a channel of 100 million!). There are several ways of
> improving - one being move at the end of replay - sort of like a compaction.
> Another is to use the fact that all removes happen from the top of the queue,
> so move the first "k" events out to hashset and remove from there - we can
> find k using the write id of the last checkpoint and the current one.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira