[
https://issues.apache.org/jira/browse/FLUME-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13455447#comment-13455447
]
Brock Noland commented on FLUME-1580:
-------------------------------------
One solution to this would be to instead of deleting the log files when they
are no longer active, archive them. Then during replay we can read them in a
special way. We know that their puts are no longer useful, but the takes might
be. Eventually archived logs will need to be deletes as well.
> FileChannel some sets of log files cannot be replayed
> -----------------------------------------------------
>
> Key: FLUME-1580
> URL: https://issues.apache.org/jira/browse/FLUME-1580
> Project: Flume
> Issue Type: Improvement
> Components: Channel
> Affects Versions: v1.3.0
> Reporter: Brock Noland
> Assignee: Brock Noland
>
> When log files not longer have put's referenced in the queue we delete them.
> Deleting these logs files is necessary to free up space. However, this can
> cause the error below due to this scenario:
> Imagine a queue with capacity 2 and the following activity:
> put a in log 1
> put b in log 1
> take a from log 2
> take b from log 2
> put c in log 1
> put d in log 1
> roll logs 1 & 2
> checkpoint and delete log 2 since no puts in the queue reference it
> for whatever reason the checkpoint is deleted
> On replay we will see:
> put a in log 1
> put b in log 1
> put c in log 1 <- this will exceed the queue capacity and throw the error
> below
> put d in log 1
> {noformat}
> 2012-09-13 17:45:14,095 (lifecycleSupervisor-1-0) [ERROR -
> org.apache.flume.channel.file.Log.replay(Log.java:354)] Failed to initialize
> Log on [channel=channel1]
> java.lang.IllegalStateException: Unable to add FlumeEventPointer [fileID=15,
> offset=2104422]. Queue depth = 5000, Capacity = 5000
> at
> org.apache.flume.channel.file.ReplayHandler.processCommit(ReplayHandler.java:394)
> at
> org.apache.flume.channel.file.ReplayHandler.replayLog(ReplayHandler.java:329)
> at org.apache.flume.channel.file.Log.replay(Log.java:339)
> at
> org.apache.flume.channel.file.FileChannel.start(FileChannel.java:272)
> at
> org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:236)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> {noformat}
> The solution at present is to delete the checkpoint, increase the capacity of
> the channel, and restart. There will be duplicate events in this case.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira