[ 
https://issues.apache.org/jira/browse/FLUME-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13699735#comment-13699735
 ] 

Juhani Connolly commented on FLUME-2118:
----------------------------------------

I'll ask my colleague to get some snapshots of the stack next time something 
like this happens. He did also try using the fast-replay but due to the sheer 
volume of backlog(about 100mil events), this eventually started to slow down 
extremely(presumably due to GC)

I'm adding the relevant section from the time he waited the full thing out and 
it succesfully finished. I can post the full logs, but it's just full of 
several aborted attempts and makes things harder to follow(anything but the 
fast replay consistently got stuck at the same file)
                
> Occasional multi-hour pauses in file channel replay
> ---------------------------------------------------
>
>                 Key: FLUME-2118
>                 URL: https://issues.apache.org/jira/browse/FLUME-2118
>             Project: Flume
>          Issue Type: Bug
>          Components: File Channel
>    Affects Versions: v1.5.0
>            Reporter: Juhani Connolly
>         Attachments: gc-flume.log.20130702
>
>
> Sometimes during replay, immediately after an EOF of one log, the replay will 
> pause for a long time.
> Here are two samples from this morning when we restarted our 3 aggregators 
> and 2 of them hit this issue.
> 02 7 2013 03:06:30,089 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 2200000 
> records
> 02 7 2013 03:06:30,179 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 2210000 
> records
> 02 7 2013 03:06:30,241 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.LogFile$SequentialReader.next:505)  - 
> Encountered EOF at 1623195625 in /data2/flume-data/log-1184
> 02 7 2013 06:23:27,629 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 2220000 
> records
> 02 7 2013 06:23:28,641 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 2230000 
> records
> 02 7 2013 06:23:29,162 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 2240000 
> records
> 02 7 2013 06:23:30,118 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 2250000 
> records
> 02 7 2013 06:23:30,750 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 2260000 
> records
> 02 7 2013 08:03:00,942 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 2160000 
> records
> 02 7 2013 08:03:01,055 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 2170000 
> records
> 02 7 2013 08:03:01,168 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 2180000 
> records
> 02 7 2013 08:03:01,181 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.LogFile$SequentialReader.next:505)  - 
> Encountered EOF at 1623195640 in /data2/flume-data/log-1182
> 02 7 2013 14:45:55,302 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 2190000 
> records
> 02 7 2013 14:45:56,282 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 2200000 
> records
> 02 7 2013 14:45:57,084 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 2210000 
> records
> 02 7 2013 14:45:59,043 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 2220000 
> records
> I've tried for an hour and some to track down the cause of this. There's 
> nothing suspicious turning up on ganglia, and a cursory review of the code 
> didn't turn up anything overly suspicious. Owing to time limitations I can't 
> dig further at this time.
> We run a version of flume from somewhat before the current 1.4 release 
> candidate(hash is eefefa941a60c0982f0957804be0cafb4d83e46e) there doesn't 
> appear to be any replay patches since then.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to