Hi Mike, No - not at present. :-( There's a possibility that a repro exists on a certain non-critical server which I will need to fix sometime next week. If it happens to have the same failure I'll see if I can figure something out.
Cheers, Edward "Edward, Someone told me they saw similar behavior but that it seemed intermittent / not consistent. I haven't seen this, typically the FC is very fast with replay. Any update on this? Thanks, Mike On Mon, Jun 17, 2013 at 3:53 PM, Edward Sargisson <[email protected]> wrote: > Hi all, > This may be a user question so feel free to punt me to that list. However, > I've just seen behaviour which seems mighty slow and I don't understand > why. > > I restarted one of our Flume agents and it took about 23 minutes before it > was ready to accept new events. The logs seem to indicate that it took the > majority of that time to workthrough the data file that only had 6885 > events in it. This seems mighty slow to me. > > Does anybody have an explanation for this? Is there something I should do > in the future to bring it back up faster? I looked at the code and there's > nothing obviously slow about it. > > Many thanks, > Edward > > Log snippet (filtered to be only this thread and large number of Pending > take messages removed): > 2013-06-17 21:53:20,154 INFO [lifecycleSupervisor-1-1] > o.a.f.c.f.FileChannel Starting FileChannel troubleshootingFileChannel { > dataDirs: [/var/local/flume/troubleshooting-file-channel/data] }... > 2013-06-17 21:53:20,155 INFO [lifecycleSupervisor-1-1] > o.a.f.c.f.Log Encryption is not enabled > 2013-06-17 21:53:20,155 INFO [lifecycleSupervisor-1-1] > o.a.f.c.f.Log Replay started > 2013-06-17 21:53:20,165 INFO [lifecycleSupervisor-1-1] > o.a.f.c.f.Log Found NextFileID 20, from > [/var/local/flume/troubleshooting-file-channel/data/log-20, > /var/local/flume/troubleshooting-file-channel/data/log-19] > 2013-06-17 21:53:20,172 INFO [lifecycleSupervisor-1-1] > o.a.f.c.f.EventQueueBackingStoreFileV3 Starting up with > /var/local/flume/troubleshooting-file-channel/checkpoint/checkpoint and > /var/local/flume/troubleshooting-file-channel/checkpoint/checkpoint.meta > 2013-06-17 21:53:20,172 INFO [lifecycleSupervisor-1-1] > o.a.f.c.f.EventQueueBackingStoreFileV3 Reading checkpoint metadata from > /var/local/flume/troubleshooting-file-channel/checkpoint/checkpoint.meta > 2013-06-17 21:53:20,213 INFO [lifecycleSupervisor-1-1] > o.a.f.c.f.Log Last Checkpoint Mon Jun 17 21:04:26 UTC 2013, queue depth = 0 > 2013-06-17 21:53:20,222 INFO [lifecycleSupervisor-1-1] > o.a.f.c.f.Log Replaying logs with v2 replay logic > 2013-06-17 21:53:20,225 INFO [lifecycleSupervisor-1-1] > o.a.f.c.f.ReplayHandler Starting replay of > [/var/local/flume/troubleshooting-file-channel/data/log-19, > /var/local/flume/troubleshooting-file-channel/data/log-20] > 2013-06-17 21:53:20,226 INFO [lifecycleSupervisor-1-1] > o.a.f.c.f.ReplayHandler Replaying > /var/local/flume/troubleshooting-file-channel/data/log-19 > 2013-06-17 21:53:20,275 WARN [lifecycleSupervisor-1-1] > o.a.f.c.f.LogFile Checkpoint for > file(/var/local/flume/troubleshooting-file-channel/data/log-19) is: > 1371488755062, which is beyond the requested checkpoint time: 0 and > position 284327361 > 2013-06-17 21:53:20,287 INFO [lifecycleSupervisor-1-1] > o.a.f.c.f.ReplayHandler Replaying > /var/local/flume/troubleshooting-file-channel/data/log-20 > 2013-06-17 21:53:20,288 WARN [lifecycleSupervisor-1-1] > o.a.f.c.f.LogFile Checkpoint for > file(/var/local/flume/troubleshooting-file-channel/data/log-20) is: > 1371488770226, which is beyond the requested checkpoint time: 0 and > position 7078049 > 2013-06-17 22:16:16,161 INFO [lifecycleSupervisor-1-1] > o.a.f.c.f.LogFile Encountered EOF at 284348767 in > /var/local/flume/troubleshooting-file-channel/data/log-19 > 2013-06-17 22:16:37,802 INFO [lifecycleSupervisor-1-1] > o.a.f.c.f.LogFile Encountered EOF at 7266532 in > /var/local/flume/troubleshooting-file-channel/data/log-20 > 2013-06-17 22:16:37,805 INFO [lifecycleSupervisor-1-1] > o.a.f.c.f.ReplayHandler read: 3133788, put: 434464, take: 2618590, > rollback: 15872, commit: 64862, skip: 0, eventCount:1535585 > 2013-06-17 22:16:37,805 DEBUG [lifecycleSupervisor-1-1] > o.a.f.c.f.ReplayHandler Pending take FlumeEventPointer [fileID=15, > offset=410595] > > ...6883 similar messages... > > 2013-06-17 22:16:48,465 DEBUG [lifecycleSupervisor-1-1] > o.a.f.c.f.ReplayHandler Pending take FlumeEventPointer [fileID=18, > offset=2579603] > 2013-06-17 22:16:48,465 INFO [lifecycleSupervisor-1-1] > o.a.f.c.f.Log Rolling /var/local/flume/troubleshooting-file-channel/data > 2013-06-17 22:16:48,466 INFO [lifecycleSupervisor-1-1] > o.a.f.c.f.Log Roll start /var/local/flume/troubleshooting-file-channel/ data > 2013-06-17 22:16:48,466 INFO [lifecycleSupervisor-1-1] > o.a.f.c.f.LogFile Opened > /var/local/flume/troubleshooting-file-channel/data/log-21 > 2013-06-17 22:16:48,644 INFO [lifecycleSupervisor-1-1] > o.a.f.c.f.Log Roll end > 2013-06-17 22:16:48,645 INFO [lifecycleSupervisor-1-1] > o.a.f.c.f.EventQueueBackingStoreFile Start checkpoint for > /var/local/flume/troubleshooting-file-channel/checkpoint/checkpoint, > elements to sync = 383705 > 2013-06-17 22:16:48,756 INFO [lifecycleSupervisor-1-1] > o.a.f.c.f.EventQueueBackingStoreFile Updating checkpoint metadata: > logWriteOrderID: 1371506000194, queueSize: 4, queueHead: 383699 > 2013-06-17 22:16:49,501 INFO [lifecycleSupervisor-1-1] > o.a.f.c.f.LogFileV3 Updating log-21.meta currentPosition = 0, > logWriteOrderID = 1371506000194 > 2013-06-17 22:16:54,333 INFO [lifecycleSupervisor-1-1] > o.a.f.c.f.Log Updated checkpoint for file: > /var/local/flume/troubleshooting-file-channel/data/log-21 position: 0 > logWriteOrderID: 1371506000194 > 2013-06-17 22:16:54,333 INFO [lifecycleSupervisor-1-1] > o.a.f.c.f.LogFile Closing RandomReader > /var/local/flume/troubleshooting-file-channel/data/log-20 > 2013-06-17 22:16:54,339 INFO [lifecycleSupervisor-1-1] > o.a.f.c.f.LogFileV3 Updating log-20.meta currentPosition = 7078049, > logWriteOrderID = 1371506000194 > 2013-06-17 22:16:54,887 DEBUG [lifecycleSupervisor-1-1] > o.a.f.c.f.Log Updated checkpoint for file: > /var/local/flume/troubleshooting-file-channel/data/log-20logWriteOrderID > 1371506000194 > 2013-06-17 22:16:54,887 INFO [lifecycleSupervisor-1-1] > o.a.f.c.f.FileChannel Queue Size after replay: 4 > [channel=troubleshootingFileChannel] > 2013-06-17 22:16:54,887 INFO [lifecycleSupervisor-1-1] > o.a.f.i.MonitoredCounterGroup Monitoried counter group for type: CHANNEL, > name: troubleshootingFileChannel, registered successfully. > 2013-06-17 22:16:54,887 INFO [lifecycleSupervisor-1-1] > o.a.f.i.MonitoredCounterGroup Component type: CHANNEL, name: > troubleshootingFileChannel started >"
