[
https://issues.apache.org/jira/browse/FLUME-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848015#comment-13848015
]
Hudson commented on FLUME-2155:
-------------------------------
SUCCESS: Integrated in flume-trunk #527 (See
[https://builds.apache.org/job/flume-trunk/527/])
FLUME-2155. Index the Flume Event Queue during replay to improve replay time.
(hshreedharan:
http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=6373032a620bdc687b6d03b12726713d08c71a10)
*
flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/LogFile.java
*
flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestEventQueueBackingStoreFactory.java
* flume-ng-channels/flume-file-channel/pom.xml
*
flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/Serialization.java
*
flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestCheckpoint.java
*
flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestCheckpointRebuilder.java
*
flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/ReplayHandler.java
*
flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FlumeEventQueue.java
*
flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFlumeEventQueue.java
*
flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java
*
flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/Log.java
*
flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/CheckpointRebuilder.java
* pom.xml
*
flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/EventQueueBackingStoreFile.java
> Improve replay time
> -------------------
>
> Key: FLUME-2155
> URL: https://issues.apache.org/jira/browse/FLUME-2155
> Project: Flume
> Issue Type: Improvement
> Reporter: Hari Shreedharan
> Assignee: Brock Noland
> Attachments: 10000-20000, 100000-110000, 300000-310000,
> 700000-710000, FLUME-2155-initial.patch, FLUME-2155.2.patch,
> FLUME-2155.4.patch, FLUME-2155.5.patch, FLUME-2155.patch,
> FLUME-FC-SLOW-REPLAY-1.patch, FLUME-FC-SLOW-REPLAY-FIX-1.patch,
> SmartReplay.pdf, SmartReplay1.1.pdf, fc-test.patch
>
>
> File Channel has scaled so well that people now run channels with sizes in
> 100's of millions of events. Turns out, replay can be crazy slow even between
> checkpoints at this scale - because of the remove() method in FlumeEventQueue
> moving every pointer that follows the one being removed (1 remove causes 99
> million+ moves for a channel of 100 million!). There are several ways of
> improving - one being move at the end of replay - sort of like a compaction.
> Another is to use the fact that all removes happen from the top of the queue,
> so move the first "k" events out to hashset and remove from there - we can
> find k using the write id of the last checkpoint and the current one.
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)