[
https://issues.apache.org/jira/browse/FLUME-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13455975#comment-13455975
]
Mike Percy commented on FLUME-1580:
-----------------------------------
I think the invariant that we want here is do not delete log files with takes
with a totally-ordered sequence number higher than the totally-ordered sequence
number of any other active put. I think we could get away with keeping track of
the highest numbered take in each log file and the lowest numbered active put
in each log file, in addition to the information we store now about the active
puts, and that should be sufficient to implement this logic.
> FileChannel some sets of log files cannot be replayed
> -----------------------------------------------------
>
> Key: FLUME-1580
> URL: https://issues.apache.org/jira/browse/FLUME-1580
> Project: Flume
> Issue Type: Improvement
> Components: Channel
> Affects Versions: v1.3.0
> Reporter: Brock Noland
> Assignee: Brock Noland
>
> When log files not longer have put's referenced in the queue we delete them.
> Deleting these logs files is necessary to free up space. However, this can
> cause the error below due to this scenario:
> Imagine a queue with capacity 2 and the following activity:
> put a in log 1
> put b in log 1
> take a from log 2
> take b from log 2
> put c in log 1
> put d in log 1
> roll logs 1 & 2
> checkpoint and delete log 2 since no puts in the queue reference it
> for whatever reason the checkpoint is deleted
> On replay we will see:
> put a in log 1
> put b in log 1
> put c in log 1 <- this will exceed the queue capacity and throw the error
> below
> put d in log 1
> Example error message:
> {noformat}
> 2012-09-13 17:45:14,095 (lifecycleSupervisor-1-0) [ERROR -
> org.apache.flume.channel.file.Log.replay(Log.java:354)] Failed to initialize
> Log on [channel=channel1]
> java.lang.IllegalStateException: Unable to add FlumeEventPointer [fileID=15,
> offset=2104422]. Queue depth = 5000, Capacity = 5000
> at
> org.apache.flume.channel.file.ReplayHandler.processCommit(ReplayHandler.java:394)
> at
> org.apache.flume.channel.file.ReplayHandler.replayLog(ReplayHandler.java:329)
> at org.apache.flume.channel.file.Log.replay(Log.java:339)
> at
> org.apache.flume.channel.file.FileChannel.start(FileChannel.java:272)
> at
> org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:236)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> {noformat}
> The solution at present is to delete the checkpoint, increase the capacity of
> the channel, and restart. There will be duplicate events in this case.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira