This was likely due to the checkpoint being corrupt and automatically being cleaned up causing a replay of all your files. Can you try enabling dual checkpoints (you will need to use trunk or the upcoming 1.4 release for this feature though).
Hari On Sunday, June 23, 2013, Mike Percy wrote: > Edward, > Someone told me they saw similar behavior but that it seemed intermittent / > not consistent. I haven't seen this, typically the FC is very fast with > replay. Any update on this? > > Thanks, > Mike > > > > On Mon, Jun 17, 2013 at 3:53 PM, Edward Sargisson <[email protected]> > wrote: > > > Hi all, > > This may be a user question so feel free to punt me to that list. > However, > > I've just seen behaviour which seems mighty slow and I don't understand > > why. > > > > I restarted one of our Flume agents and it took about 23 minutes before > it > > was ready to accept new events. The logs seem to indicate that it took > the > > majority of that time to workthrough the data file that only had 6885 > > events in it. This seems mighty slow to me. > > > > Does anybody have an explanation for this? Is there something I should do > > in the future to bring it back up faster? I looked at the code and > there's > > nothing obviously slow about it. > > > > Many thanks, > > Edward > > > > Log snippet (filtered to be only this thread and large number of Pending > > take messages removed): > > 2013-06-17 21:53:20,154 INFO [lifecycleSupervisor-1-1] > > o.a.f.c.f.FileChannel Starting FileChannel troubleshootingFileChannel { > > dataDirs: [/var/local/flume/troubleshooting-file-channel/data] }... > > 2013-06-17 21:53:20,155 INFO [lifecycleSupervisor-1-1] > > o.a.f.c.f.Log Encryption is not enabled > > 2013-06-17 21:53:20,155 INFO [lifecycleSupervisor-1-1] > > o.a.f.c.f.Log Replay started > > 2013-06-17 21:53:20,165 INFO [lifecycleSupervisor-1-1] > > o.a.f.c.f.Log Found NextFileID 20, from > > [/var/local/flume/troubleshooting-file-channel/data/log-20, > > /var/local/flume/troubleshooting-file-channel/data/log-19] > > 2013-06-17 21:53:20,172 INFO [lifecycleSupervisor-1-1] > > o.a.f.c.f.EventQueueBackingStoreFileV3 Starting up with > > /var/local/flume/troubleshooting-file-channel/checkpoint/checkpoint and > > /var/local/flume/troubleshooting-file-channel/checkpoint/checkpoint.meta > > 2013-06-17 21:53:20,172 INFO [lifecycleSupervisor-1-1] > > o.a.f.c.f.EventQueueBackingStoreFileV3 Reading checkpoint metadata from > > /var/local/flume/troubleshooting-file-channel/checkpoint/checkpoint.meta > > 2013-06-17 21:53:20,213 INFO [lifecycleSupervisor-1-1] > > o.a.f.c.f.Log Last Checkpoint Mon Jun 17 21:04:26 UTC 2013, queue depth > = 0 > > 2013-06-17 21:53:20,222 INFO [lifecycleSupervisor-1-1] > > o.a.f.c.f.Log Replaying logs with v2 replay logic > > 2013-06-17 21:53:20,225 INFO [lifecycleSupervisor-1-1] > > o.a.f.c.f.ReplayHandler Starting replay of > > [/var/local/flume/troubleshooting-file-channel/data/log-19, > > /var/local/flume/troubleshooting-file-channel/data/log-20] > > 2013-06-17 21:53:20,226 INFO [lifecycleSupervisor-1-1] > > o.a.f.c.f.ReplayHandler Replaying > > /var/local/flume/troubleshooting-file-channel/data/log-19 > > 2013-06-17 21:53:20,275 WARN [lifecycleSupervisor-1-1] > > o.a.f.c.f.LogFile Checkpoint for > > file(/var/local/flume/troubleshooting-file-channel/data/log-19) is: > > 1371488755062, which is beyond the requested checkpoint time: 0 and > > position 284327361 > > 2013-06-17 21:53:20,287 INFO [lifecycleSupervisor-1-1] > > o.a.f.c.f.ReplayHandler Replaying > > /var/local/flume/troubleshooting-file-channel/data/log-20 > > 2013-06-17 21:53:20,288 WARN [lifecycleSupervisor-1-1] > > o.a.f.c.f.LogFile Checkpoint for > > file(/var/local/flume/troubleshooting-file-channel/data/log-20) is: > > 1371488770226, which is beyond the requested checkpoint time: 0 and > > position 7078049 > > 2013-06-17 22:16:16,161 INFO [lifecycleSupervisor-1-1] > > o.a.f.c.f.LogFile Encountered EOF at 284348767 in > > /var/local/flume/troubleshooting-file-channel/data/log-19 > > 2013-06-17 22:16:37,802 INFO [lifecycleSupervisor-1-1] > > o.a.f.c.f.LogFile Encountered EOF at 7266532 in > > /var/local/flume/troubleshooting-file-channel/data/log-20 > > 2013-06-17 22:16:37,805 INFO [lifecycleSupervisor-1-1] > > o.a.f.c.f.ReplayHandler read: 3133788, put: 434464, take: 2618590, > > rollback: 15872, commit: 64862, skip: 0, eventCount:1535585 > > 2013-06-17 22:16:37,805 DEBUG [lifecycleSupervisor-1-1] > > o.a.f.c.f.ReplayHandler Pending take FlumeEventPointer [fileID=15, > > offset=410595] > > > > ...6883 similar messages... > > > > 2013-06-17 22:16:48,465 DEBUG [lifecycleSupervisor-1-1] > > o.a.f.c.f.ReplayHandler Pending take FlumeEve
