Hi all,
This may be a user question so feel free to punt me to that list. However,
I've just seen behaviour which seems mighty slow and I don't understand why.
I restarted one of our Flume agents and it took about 23 minutes before it
was ready to accept new events. The logs seem to indicate that it took the
majority of that time to workthrough the data file that only had 6885
events in it. This seems mighty slow to me.
Does anybody have an explanation for this? Is there something I should do
in the future to bring it back up faster? I looked at the code and there's
nothing obviously slow about it.
Many thanks,
Edward
Log snippet (filtered to be only this thread and large number of Pending
take messages removed):
2013-06-17 21:53:20,154 INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.FileChannel Starting FileChannel troubleshootingFileChannel {
dataDirs: [/var/local/flume/troubleshooting-file-channel/data] }...
2013-06-17 21:53:20,155 INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.Log Encryption is not enabled
2013-06-17 21:53:20,155 INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.Log Replay started
2013-06-17 21:53:20,165 INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.Log Found NextFileID 20, from
[/var/local/flume/troubleshooting-file-channel/data/log-20,
/var/local/flume/troubleshooting-file-channel/data/log-19]
2013-06-17 21:53:20,172 INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.EventQueueBackingStoreFileV3 Starting up with
/var/local/flume/troubleshooting-file-channel/checkpoint/checkpoint and
/var/local/flume/troubleshooting-file-channel/checkpoint/checkpoint.meta
2013-06-17 21:53:20,172 INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.EventQueueBackingStoreFileV3 Reading checkpoint metadata from
/var/local/flume/troubleshooting-file-channel/checkpoint/checkpoint.meta
2013-06-17 21:53:20,213 INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.Log Last Checkpoint Mon Jun 17 21:04:26 UTC 2013, queue depth = 0
2013-06-17 21:53:20,222 INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.Log Replaying logs with v2 replay logic
2013-06-17 21:53:20,225 INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.ReplayHandler Starting replay of
[/var/local/flume/troubleshooting-file-channel/data/log-19,
/var/local/flume/troubleshooting-file-channel/data/log-20]
2013-06-17 21:53:20,226 INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.ReplayHandler Replaying
/var/local/flume/troubleshooting-file-channel/data/log-19
2013-06-17 21:53:20,275 WARN [lifecycleSupervisor-1-1]
o.a.f.c.f.LogFile Checkpoint for
file(/var/local/flume/troubleshooting-file-channel/data/log-19) is:
1371488755062, which is beyond the requested checkpoint time: 0 and
position 284327361
2013-06-17 21:53:20,287 INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.ReplayHandler Replaying
/var/local/flume/troubleshooting-file-channel/data/log-20
2013-06-17 21:53:20,288 WARN [lifecycleSupervisor-1-1]
o.a.f.c.f.LogFile Checkpoint for
file(/var/local/flume/troubleshooting-file-channel/data/log-20) is:
1371488770226, which is beyond the requested checkpoint time: 0 and
position 7078049
2013-06-17 22:16:16,161 INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.LogFile Encountered EOF at 284348767 in
/var/local/flume/troubleshooting-file-channel/data/log-19
2013-06-17 22:16:37,802 INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.LogFile Encountered EOF at 7266532 in
/var/local/flume/troubleshooting-file-channel/data/log-20
2013-06-17 22:16:37,805 INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.ReplayHandler read: 3133788, put: 434464, take: 2618590,
rollback: 15872, commit: 64862, skip: 0, eventCount:1535585
2013-06-17 22:16:37,805 DEBUG [lifecycleSupervisor-1-1]
o.a.f.c.f.ReplayHandler Pending take FlumeEventPointer [fileID=15,
offset=410595]
...6883 similar messages...
2013-06-17 22:16:48,465 DEBUG [lifecycleSupervisor-1-1]
o.a.f.c.f.ReplayHandler Pending take FlumeEventPointer [fileID=18,
offset=2579603]
2013-06-17 22:16:48,465 INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.Log Rolling /var/local/flume/troubleshooting-file-channel/data
2013-06-17 22:16:48,466 INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.Log Roll start /var/local/flume/troubleshooting-file-channel/data
2013-06-17 22:16:48,466 INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.LogFile Opened
/var/local/flume/troubleshooting-file-channel/data/log-21
2013-06-17 22:16:48,644 INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.Log Roll end
2013-06-17 22:16:48,645 INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.EventQueueBackingStoreFile Start checkpoint for
/var/local/flume/troubleshooting-file-channel/checkpoint/checkpoint,
elements to sync = 383705
2013-06-17 22:16:48,756 INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.EventQueueBackingStoreFile Updating checkpoint metadata:
logWriteOrderID: 1371506000194, queueSize: 4, queueHead: 383699
2013-06-17 22:16:49,501 INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.LogFileV3 Updating log-21.meta currentPosition = 0,
logWriteOrderID = 1371506000194
2013-06-17 22:16:54,333 INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.Log Updated checkpoint for file:
/var/local/flume/troubleshooting-file-channel/data/log-21 position: 0
logWriteOrderID: 1371506000194
2013-06-17 22:16:54,333 INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.LogFile Closing RandomReader
/var/local/flume/troubleshooting-file-channel/data/log-20
2013-06-17 22:16:54,339 INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.LogFileV3 Updating log-20.meta currentPosition = 7078049,
logWriteOrderID = 1371506000194
2013-06-17 22:16:54,887 DEBUG [lifecycleSupervisor-1-1]
o.a.f.c.f.Log Updated checkpoint for file:
/var/local/flume/troubleshooting-file-channel/data/log-20logWriteOrderID
1371506000194
2013-06-17 22:16:54,887 INFO [lifecycleSupervisor-1-1]
o.a.f.c.f.FileChannel Queue Size after replay: 4
[channel=troubleshootingFileChannel]
2013-06-17 22:16:54,887 INFO [lifecycleSupervisor-1-1]
o.a.f.i.MonitoredCounterGroup Monitoried counter group for type: CHANNEL,
name: troubleshootingFileChannel, registered successfully.
2013-06-17 22:16:54,887 INFO [lifecycleSupervisor-1-1]
o.a.f.i.MonitoredCounterGroup Component type: CHANNEL, name:
troubleshootingFileChannel started