Brock Noland created FLUME-1580:
-----------------------------------

             Summary: FileChannel some sets of log files cannot be replayed
                 Key: FLUME-1580
                 URL: https://issues.apache.org/jira/browse/FLUME-1580
             Project: Flume
          Issue Type: Improvement
          Components: Channel
    Affects Versions: v1.3.0
            Reporter: Brock Noland
            Assignee: Brock Noland


When log files not longer have put's referenced in the queue we delete them. 
Deleting these logs files is necessary to free up space. However, this can 
cause the error below due to this scenario:

Imagine a queue with capacity 2 and the following activity:

put a in log 1
put b in log 1
take a from log 2
take b from log 2
put c in log 1
put d in log 1
roll logs 1 & 2
checkpoint and delete log 2 since no puts in the queue reference it
for whatever reason the checkpoint is deleted

On replay we will see:
put a in log 1
put b in log 1
put c in log 1 <- this will exceed the queue capacity and throw the error below
put d in log 1


{noformat}
2012-09-13 17:45:14,095 (lifecycleSupervisor-1-0) [ERROR - 
org.apache.flume.channel.file.Log.replay(Log.java:354)] Failed to initialize 
Log on [channel=channel1]
java.lang.IllegalStateException: Unable to add FlumeEventPointer [fileID=15, 
offset=2104422]. Queue depth = 5000, Capacity = 5000
        at 
org.apache.flume.channel.file.ReplayHandler.processCommit(ReplayHandler.java:394)
        at 
org.apache.flume.channel.file.ReplayHandler.replayLog(ReplayHandler.java:329)
        at org.apache.flume.channel.file.Log.replay(Log.java:339)
        at org.apache.flume.channel.file.FileChannel.start(FileChannel.java:272)
        at 
org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:236)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at 
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
{noformat}

The solution at present is to delete the checkpoint, increase the capacity of 
the channel, and restart. There will be duplicate events in this case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to