Juhani Connolly created FLUME-2118:
--------------------------------------

             Summary: Occasional multi-hour pauses in file channel replay
                 Key: FLUME-2118
                 URL: https://issues.apache.org/jira/browse/FLUME-2118
             Project: Flume
          Issue Type: Bug
          Components: File Channel
    Affects Versions: v1.5.0
            Reporter: Juhani Connolly


Sometimes during replay, immediately after an EOF of one log, the replay will 
pause for a long time.

Here are two samples from this morning when we restarted our 3 aggregators and 
2 of them hit this issue.

02 7 2013 03:06:30,089 INFO  [lifecycleSupervisor-1-0] 
(org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 2200000 
records
02 7 2013 03:06:30,179 INFO  [lifecycleSupervisor-1-0] 
(org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 2210000 
records
02 7 2013 03:06:30,241 INFO  [lifecycleSupervisor-1-0] 
(org.apache.flume.channel.file.LogFile$SequentialReader.next:505)  - 
Encountered EOF at 1623195625 in /data2/flume-data/log-1184
02 7 2013 06:23:27,629 INFO  [lifecycleSupervisor-1-0] 
(org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 2220000 
records
02 7 2013 06:23:28,641 INFO  [lifecycleSupervisor-1-0] 
(org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 2230000 
records
02 7 2013 06:23:29,162 INFO  [lifecycleSupervisor-1-0] 
(org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 2240000 
records
02 7 2013 06:23:30,118 INFO  [lifecycleSupervisor-1-0] 
(org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 2250000 
records
02 7 2013 06:23:30,750 INFO  [lifecycleSupervisor-1-0] 
(org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 2260000 
records




02 7 2013 08:03:00,942 INFO  [lifecycleSupervisor-1-0] 
(org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 2160000 
records
02 7 2013 08:03:01,055 INFO  [lifecycleSupervisor-1-0] 
(org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 2170000 
records
02 7 2013 08:03:01,168 INFO  [lifecycleSupervisor-1-0] 
(org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 2180000 
records
02 7 2013 08:03:01,181 INFO  [lifecycleSupervisor-1-0] 
(org.apache.flume.channel.file.LogFile$SequentialReader.next:505)  - 
Encountered EOF at 1623195640 in /data2/flume-data/log-1182
02 7 2013 14:45:55,302 INFO  [lifecycleSupervisor-1-0] 
(org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 2190000 
records
02 7 2013 14:45:56,282 INFO  [lifecycleSupervisor-1-0] 
(org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 2200000 
records
02 7 2013 14:45:57,084 INFO  [lifecycleSupervisor-1-0] 
(org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 2210000 
records
02 7 2013 14:45:59,043 INFO  [lifecycleSupervisor-1-0] 
(org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 2220000 
records


I've tried for an hour and some to track down the cause of this. There's 
nothing suspicious turning up on ganglia, and a cursory review of the code 
didn't turn up anything overly suspicious. Owing to time limitations I can't 
dig further at this time.

We run a version of flume from somewhat before the current 1.4 release 
candidate(hash is eefefa941a60c0982f0957804be0cafb4d83e46e) there doesn't 
appear to be any replay patches since then.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to