Hari Shreedharan created FLUME-2155:
---------------------------------------

             Summary: Improve replay time
                 Key: FLUME-2155
                 URL: https://issues.apache.org/jira/browse/FLUME-2155
             Project: Flume
          Issue Type: Bug
            Reporter: Hari Shreedharan
            Assignee: Hari Shreedharan


File Channel has scaled so well that people now run channels with sizes in 
100's of millions of events. Turns out, replay can be crazy slow even between 
checkpoints at this scale - because of the remove() method in FlumeEventQueue 
moving every pointer that follows the one being removed (1 remove causes 99 
million+ moves for a channel of 100 million!). There are several ways of 
improving - one being move at the end of replay - sort of like a compaction. 
Another is to use the fact that all removes happen from the top of the queue, 
so move the first "k" events out to hashset and remove from there - we can find 
k using the write id of the last checkpoint and the current one. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to