Edward,
Someone told me they saw similar behavior but that it seemed intermittent /
not consistent. I haven't seen this, typically the FC is very fast with
replay. Any update on this?

Thanks,
Mike



On Mon, Jun 17, 2013 at 3:53 PM, Edward Sargisson <[email protected]> wrote:

> Hi all,
> This may be a user question so feel free to punt me to that list. However,
> I've just seen behaviour which seems mighty slow and I don't understand
> why.
>
> I restarted one of our Flume agents and it took about 23 minutes before it
> was ready to accept new events. The logs seem to indicate that it took the
> majority of that time to workthrough the data file that only had 6885
> events in it. This seems mighty slow to me.
>
> Does anybody have an explanation for this? Is there something I should do
> in the future to bring it back up faster? I looked at the code and there's
> nothing obviously slow about it.
>
> Many thanks,
> Edward
>
> Log snippet (filtered to be only this thread and large number of Pending
> take messages removed):
> 2013-06-17 21:53:20,154  INFO [lifecycleSupervisor-1-1]
> o.a.f.c.f.FileChannel Starting FileChannel troubleshootingFileChannel {
> dataDirs: [/var/local/flume/troubleshooting-file-channel/data] }...
> 2013-06-17 21:53:20,155  INFO [lifecycleSupervisor-1-1]
> o.a.f.c.f.Log Encryption is not enabled
> 2013-06-17 21:53:20,155  INFO [lifecycleSupervisor-1-1]
> o.a.f.c.f.Log Replay started
> 2013-06-17 21:53:20,165  INFO [lifecycleSupervisor-1-1]
> o.a.f.c.f.Log Found NextFileID 20, from
> [/var/local/flume/troubleshooting-file-channel/data/log-20,
> /var/local/flume/troubleshooting-file-channel/data/log-19]
> 2013-06-17 21:53:20,172  INFO [lifecycleSupervisor-1-1]
> o.a.f.c.f.EventQueueBackingStoreFileV3 Starting up with
> /var/local/flume/troubleshooting-file-channel/checkpoint/checkpoint and
> /var/local/flume/troubleshooting-file-channel/checkpoint/checkpoint.meta
> 2013-06-17 21:53:20,172  INFO [lifecycleSupervisor-1-1]
> o.a.f.c.f.EventQueueBackingStoreFileV3 Reading checkpoint metadata from
> /var/local/flume/troubleshooting-file-channel/checkpoint/checkpoint.meta
> 2013-06-17 21:53:20,213  INFO [lifecycleSupervisor-1-1]
> o.a.f.c.f.Log Last Checkpoint Mon Jun 17 21:04:26 UTC 2013, queue depth = 0
> 2013-06-17 21:53:20,222  INFO [lifecycleSupervisor-1-1]
> o.a.f.c.f.Log Replaying logs with v2 replay logic
> 2013-06-17 21:53:20,225  INFO [lifecycleSupervisor-1-1]
> o.a.f.c.f.ReplayHandler Starting replay of
> [/var/local/flume/troubleshooting-file-channel/data/log-19,
> /var/local/flume/troubleshooting-file-channel/data/log-20]
> 2013-06-17 21:53:20,226  INFO [lifecycleSupervisor-1-1]
> o.a.f.c.f.ReplayHandler Replaying
> /var/local/flume/troubleshooting-file-channel/data/log-19
> 2013-06-17 21:53:20,275  WARN [lifecycleSupervisor-1-1]
> o.a.f.c.f.LogFile Checkpoint for
> file(/var/local/flume/troubleshooting-file-channel/data/log-19) is:
> 1371488755062, which is beyond the requested checkpoint time: 0 and
> position 284327361
> 2013-06-17 21:53:20,287  INFO [lifecycleSupervisor-1-1]
> o.a.f.c.f.ReplayHandler Replaying
> /var/local/flume/troubleshooting-file-channel/data/log-20
> 2013-06-17 21:53:20,288  WARN [lifecycleSupervisor-1-1]
> o.a.f.c.f.LogFile Checkpoint for
> file(/var/local/flume/troubleshooting-file-channel/data/log-20) is:
> 1371488770226, which is beyond the requested checkpoint time: 0 and
> position 7078049
> 2013-06-17 22:16:16,161  INFO [lifecycleSupervisor-1-1]
> o.a.f.c.f.LogFile Encountered EOF at 284348767 in
> /var/local/flume/troubleshooting-file-channel/data/log-19
> 2013-06-17 22:16:37,802  INFO [lifecycleSupervisor-1-1]
> o.a.f.c.f.LogFile Encountered EOF at 7266532 in
> /var/local/flume/troubleshooting-file-channel/data/log-20
> 2013-06-17 22:16:37,805  INFO [lifecycleSupervisor-1-1]
> o.a.f.c.f.ReplayHandler read: 3133788, put: 434464, take: 2618590,
> rollback: 15872, commit: 64862, skip: 0, eventCount:1535585
> 2013-06-17 22:16:37,805 DEBUG [lifecycleSupervisor-1-1]
> o.a.f.c.f.ReplayHandler Pending take FlumeEventPointer [fileID=15,
> offset=410595]
>
> ...6883 similar messages...
>
> 2013-06-17 22:16:48,465 DEBUG [lifecycleSupervisor-1-1]
> o.a.f.c.f.ReplayHandler Pending take FlumeEventPointer [fileID=18,
> offset=2579603]
> 2013-06-17 22:16:48,465  INFO [lifecycleSupervisor-1-1]
> o.a.f.c.f.Log Rolling /var/local/flume/troubleshooting-file-channel/data
> 2013-06-17 22:16:48,466  INFO [lifecycleSupervisor-1-1]
> o.a.f.c.f.Log Roll start /var/local/flume/troubleshooting-file-channel/data
> 2013-06-17 22:16:48,466  INFO [lifecycleSupervisor-1-1]
> o.a.f.c.f.LogFile Opened
> /var/local/flume/troubleshooting-file-channel/data/log-21
> 2013-06-17 22:16:48,644  INFO [lifecycleSupervisor-1-1]
> o.a.f.c.f.Log Roll end
> 2013-06-17 22:16:48,645  INFO [lifecycleSupervisor-1-1]
> o.a.f.c.f.EventQueueBackingStoreFile Start checkpoint for
> /var/local/flume/troubleshooting-file-channel/checkpoint/checkpoint,
> elements to sync = 383705
> 2013-06-17 22:16:48,756  INFO [lifecycleSupervisor-1-1]
> o.a.f.c.f.EventQueueBackingStoreFile Updating checkpoint metadata:
> logWriteOrderID: 1371506000194, queueSize: 4, queueHead: 383699
> 2013-06-17 22:16:49,501  INFO [lifecycleSupervisor-1-1]
> o.a.f.c.f.LogFileV3 Updating log-21.meta currentPosition = 0,
> logWriteOrderID = 1371506000194
> 2013-06-17 22:16:54,333  INFO [lifecycleSupervisor-1-1]
> o.a.f.c.f.Log Updated checkpoint for file:
> /var/local/flume/troubleshooting-file-channel/data/log-21 position: 0
> logWriteOrderID: 1371506000194
> 2013-06-17 22:16:54,333  INFO [lifecycleSupervisor-1-1]
> o.a.f.c.f.LogFile Closing RandomReader
> /var/local/flume/troubleshooting-file-channel/data/log-20
> 2013-06-17 22:16:54,339  INFO [lifecycleSupervisor-1-1]
> o.a.f.c.f.LogFileV3 Updating log-20.meta currentPosition = 7078049,
> logWriteOrderID = 1371506000194
> 2013-06-17 22:16:54,887 DEBUG [lifecycleSupervisor-1-1]
> o.a.f.c.f.Log Updated checkpoint for file:
> /var/local/flume/troubleshooting-file-channel/data/log-20logWriteOrderID
> 1371506000194
> 2013-06-17 22:16:54,887  INFO [lifecycleSupervisor-1-1]
> o.a.f.c.f.FileChannel Queue Size after replay: 4
> [channel=troubleshootingFileChannel]
> 2013-06-17 22:16:54,887  INFO [lifecycleSupervisor-1-1]
> o.a.f.i.MonitoredCounterGroup Monitoried counter group for type: CHANNEL,
> name: troubleshootingFileChannel, registered successfully.
> 2013-06-17 22:16:54,887  INFO [lifecycleSupervisor-1-1]
> o.a.f.i.MonitoredCounterGroup Component type: CHANNEL, name:
> troubleshootingFileChannel started
>

Reply via email to