Good point Hari. I think a log file snippet would help to clarify. Mike
On Sun, Jun 23, 2013 at 3:05 AM, Hari Shreedharan <[email protected] > wrote: > This was likely due to the checkpoint being corrupt and automatically being > cleaned up causing a replay of all your files. Can you try enabling dual > checkpoints (you will need to use trunk or the upcoming 1.4 release for > this feature though). > > Hari > > On Sunday, June 23, 2013, Mike Percy wrote: > > > Edward, > > Someone told me they saw similar behavior but that it seemed > intermittent / > > not consistent. I haven't seen this, typically the FC is very fast with > > replay. Any update on this? > > > > Thanks, > > Mike > > > > > > > > On Mon, Jun 17, 2013 at 3:53 PM, Edward Sargisson <[email protected]> > > wrote: > > > > > Hi all, > > > This may be a user question so feel free to punt me to that list. > > However, > > > I've just seen behaviour which seems mighty slow and I don't understand > > > why. > > > > > > I restarted one of our Flume agents and it took about 23 minutes before > > it > > > was ready to accept new events. The logs seem to indicate that it took > > the > > > majority of that time to workthrough the data file that only had 6885 > > > events in it. This seems mighty slow to me. > > > > > > Does anybody have an explanation for this? Is there something I should > do > > > in the future to bring it back up faster? I looked at the code and > > there's > > > nothing obviously slow about it. > > > > > > Many thanks, > > > Edward > > > > > > Log snippet (filtered to be only this thread and large number of > Pending > > > take messages removed): > > > 2013-06-17 21:53:20,154 INFO [lifecycleSupervisor-1-1] > > > o.a.f.c.f.FileChannel Starting FileChannel troubleshootingFileChannel { > > > dataDirs: [/var/local/flume/troubleshooting-file-channel/data] }... > > > 2013-06-17 21:53:20,155 INFO [lifecycleSupervisor-1-1] > > > o.a.f.c.f.Log Encryption is not enabled > > > 2013-06-17 21:53:20,155 INFO [lifecycleSupervisor-1-1] > > > o.a.f.c.f.Log Replay started > > > 2013-06-17 21:53:20,165 INFO [lifecycleSupervisor-1-1] > > > o.a.f.c.f.Log Found NextFileID 20, from > > > [/var/local/flume/troubleshooting-file-channel/data/log-20, > > > /var/local/flume/troubleshooting-file-channel/data/log-19] > > > 2013-06-17 21:53:20,172 INFO [lifecycleSupervisor-1-1] > > > o.a.f.c.f.EventQueueBackingStoreFileV3 Starting up with > > > /var/local/flume/troubleshooting-file-channel/checkpoint/checkpoint and > > > > /var/local/flume/troubleshooting-file-channel/checkpoint/checkpoint.meta > > > 2013-06-17 21:53:20,172 INFO [lifecycleSupervisor-1-1] > > > o.a.f.c.f.EventQueueBackingStoreFileV3 Reading checkpoint metadata from > > > > /var/local/flume/troubleshooting-file-channel/checkpoint/checkpoint.meta > > > 2013-06-17 21:53:20,213 INFO [lifecycleSupervisor-1-1] > > > o.a.f.c.f.Log Last Checkpoint Mon Jun 17 21:04:26 UTC 2013, queue depth > > = 0 > > > 2013-06-17 21:53:20,222 INFO [lifecycleSupervisor-1-1] > > > o.a.f.c.f.Log Replaying logs with v2 replay logic > > > 2013-06-17 21:53:20,225 INFO [lifecycleSupervisor-1-1] > > > o.a.f.c.f.ReplayHandler Starting replay of > > > [/var/local/flume/troubleshooting-file-channel/data/log-19, > > > /var/local/flume/troubleshooting-file-channel/data/log-20] > > > 2013-06-17 21:53:20,226 INFO [lifecycleSupervisor-1-1] > > > o.a.f.c.f.ReplayHandler Replaying > > > /var/local/flume/troubleshooting-file-channel/data/log-19 > > > 2013-06-17 21:53:20,275 WARN [lifecycleSupervisor-1-1] > > > o.a.f.c.f.LogFile Checkpoint for > > > file(/var/local/flume/troubleshooting-file-channel/data/log-19) is: > > > 1371488755062, which is beyond the requested checkpoint time: 0 and > > > position 284327361 > > > 2013-06-17 21:53:20,287 INFO [lifecycleSupervisor-1-1] > > > o.a.f.c.f.ReplayHandler Replaying > > > /var/local/flume/troubleshooting-file-channel/data/log-20 > > > 2013-06-17 21:53:20,288 WARN [lifecycleSupervisor-1-1] > > > o.a.f.c.f.LogFile Checkpoint for > > > file(/var/local/flume/troubleshooting-file-channel/data/log-20) is: > > > 1371488770226, which is beyond the requested checkpoint time: 0 and > > > position 7078049 > > > 2013-06-17 22:16:16,161 INFO [lifecycleSupervisor-1-1] > > > o.a.f.c.f.LogFile Encountered EOF at 284348767 in > > > /var/local/flume/troubleshooting-file-channel/data/log-19 > > > 2013-06-17 22:16:37,802 INFO [lifecycleSupervisor-1-1] > > > o.a.f.c.f.LogFile Encountered EOF at 7266532 in > > > /var/local/flume/troubleshooting-file-channel/data/log-20 > > > 2013-06-17 22:16:37,805 INFO [lifecycleSupervisor-1-1] > > > o.a.f.c.f.ReplayHandler read: 3133788, put: 434464, take: 2618590, > > > rollback: 15872, commit: 64862, skip: 0, eventCount:1535585 > > > 2013-06-17 22:16:37,805 DEBUG [lifecycleSupervisor-1-1] > > > o.a.f.c.f.ReplayHandler Pending take FlumeEventPointer [fileID=15, > > > offset=410595] > > > > > > ...6883 similar messages... > > > > > > 2013-06-17 22:16:48,465 DEBUG [lifecycleSupervisor-1-1] > > > o.a.f.c.f.ReplayHandler Pending take FlumeEve >
