[ 
https://issues.apache.org/jira/browse/FLUME-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13455447#comment-13455447
 ] 

Brock Noland commented on FLUME-1580:
-------------------------------------

One solution to this would be to instead of deleting the log files when they 
are no longer active, archive them. Then during replay we can read them in a 
special way. We know that their puts are no longer useful, but the takes might 
be. Eventually archived logs will need to be deletes as well.
                
> FileChannel some sets of log files cannot be replayed
> -----------------------------------------------------
>
>                 Key: FLUME-1580
>                 URL: https://issues.apache.org/jira/browse/FLUME-1580
>             Project: Flume
>          Issue Type: Improvement
>          Components: Channel
>    Affects Versions: v1.3.0
>            Reporter: Brock Noland
>            Assignee: Brock Noland
>
> When log files not longer have put's referenced in the queue we delete them. 
> Deleting these logs files is necessary to free up space. However, this can 
> cause the error below due to this scenario:
> Imagine a queue with capacity 2 and the following activity:
> put a in log 1
> put b in log 1
> take a from log 2
> take b from log 2
> put c in log 1
> put d in log 1
> roll logs 1 & 2
> checkpoint and delete log 2 since no puts in the queue reference it
> for whatever reason the checkpoint is deleted
> On replay we will see:
> put a in log 1
> put b in log 1
> put c in log 1 <- this will exceed the queue capacity and throw the error 
> below
> put d in log 1
> {noformat}
> 2012-09-13 17:45:14,095 (lifecycleSupervisor-1-0) [ERROR - 
> org.apache.flume.channel.file.Log.replay(Log.java:354)] Failed to initialize 
> Log on [channel=channel1]
> java.lang.IllegalStateException: Unable to add FlumeEventPointer [fileID=15, 
> offset=2104422]. Queue depth = 5000, Capacity = 5000
>         at 
> org.apache.flume.channel.file.ReplayHandler.processCommit(ReplayHandler.java:394)
>         at 
> org.apache.flume.channel.file.ReplayHandler.replayLog(ReplayHandler.java:329)
>         at org.apache.flume.channel.file.Log.replay(Log.java:339)
>         at 
> org.apache.flume.channel.file.FileChannel.start(FileChannel.java:272)
>         at 
> org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:236)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at 
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> {noformat}
> The solution at present is to delete the checkpoint, increase the capacity of 
> the channel, and restart. There will be duplicate events in this case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to