[
https://issues.apache.org/jira/browse/FLUME-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555619#comment-13555619
]
Brock Noland commented on FLUME-1685:
-------------------------------------
Hi Steve,
I think we'd have to have a flag to turn this on/off because it would change
behavior which is disallowed in a minor release. Other than that a few comments:
1) Create a RB item fo this (reviews.apache.org) and link it here
2) Limit lines to approximately 80 chars
Thanks for your patch!!
Brock
> ExecSource shouldn't die if the channel is full
> -----------------------------------------------
>
> Key: FLUME-1685
> URL: https://issues.apache.org/jira/browse/FLUME-1685
> Project: Flume
> Issue Type: Bug
> Components: Sinks+Sources
> Affects Versions: v1.2.0, v1.3.0, v1.4.0
> Reporter: Steve Hoffman
> Attachments:
> 0001-FLUME-1685-don-t-kill-ExecSource-if-channel-is-full-branch-v1.2.0.patch,
> 0001-FLUME-1685-don-t-kill-ExecSource-if-channel-is-full-trunk.patch
>
>
> Imagine this scenario. You are using the ExecSource to tail a file and send
> to a file channel. When the file channel fills due to a temporary issue
> downstream, the source gets a ChannelException which kills the source.
> {code}
> 2012-10-31 20:45:57,872 ERROR source.ExecSource: Failed while running
> command: tail -F /tmp/test.log
> org.apache.flume.ChannelException: Unable to put batch on required channel:
> FileChannel test { dataDirs: [/tmp/test/data] }
> at
> org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:195)
> at
> org.apache.flume.source.ExecSource$ExecRunnable.run(ExecSource.java:275)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: org.apache.flume.ChannelException: Cannot acquire capacity.
> [channel=hbasejson]
> at
> org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doPut(FileChannel.java:346)
> at
> org.apache.flume.channel.BasicTransactionSemantics.put(BasicTransactionSemantics.java:93)
> at
> org.apache.flume.channel.BasicChannelSemantics.put(BasicChannelSemantics.java:76)
> at
> org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:184)
> ... 7 more
> {code}
> The situation where the command being 'exec'ed fails/exits is already handled
> with the existing retry logic.
> I suggest that when the source gets a ChannelException it throw the event
> away (since there is nowhere to put it) and instead sleep for second and loop
> again for another event. If the channel is still throwing an exception
> (still full), the event dropped and the sleep time doubled and we repeat
> again. There should be an upper bound on the retry time (say 128 seconds --
> about 2 minutes) for the next attempt. When the putEvent no longer throws a
> ChannelException, the "fallback" mode is reset and we read records at full
> speed again.
> Clearly in a situation where the channel is full, data loss will happen. But
> in this case, we wouldn't have to restart the agent. At scale this is an
> administrative pain. Even detecting this is difficult as the flume agent
> itself is still running. In this case (running a 'tail'), the tail will
> eventually result in data loss should the file being tailed rotate.
> Something has to give somewhere.
> I've got a patch I'm working on for this, but wanted to get the JIRA rolling
> first.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira