David Stendardi created FLUME-2240:
--------------------------------------

             Summary: Add Throttling to spool directory source
                 Key: FLUME-2240
                 URL: https://issues.apache.org/jira/browse/FLUME-2240
             Project: Flume
          Issue Type: Improvement
          Components: Sinks+Sources
    Affects Versions: v1.4.0
         Environment: linux debian cdh4.4.0
            Reporter: David Stendardi
            Priority: Minor


As a user I would like to replay a big file using the spool directory source 
without creating to much back pressure on the underlaying channel :

We tried to setup a recovery procedure using the spool directory and it appears 
that our avro sink can not handle events fast enough so the channel is throwing 
the following exception : 

{code}
19 Nov 2013 12:32:10,100 ERROR [pool-10-thread-1] 
(org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run:195)  
- FATAL: Spool Directory source recovery: { spoolDir: /data/flumeng/recovery }: 
Uncaught exception in Spoo
lDirectorySource thread. Restart or reconfigure Flume to continue processing.
org.apache.flume.ChannelException: Unable to put batch on required channel: 
org.apache.flume.channel.MemoryChannel{name: recovery}
        at 
org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:200)
        at 
org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:189)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:440)
        at 
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:909)
        at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.flume.ChannelException: Space for commit to queue 
couldn't be acquired Sinks are likely not keeping up with sources, or the 
buffer size is too tight
        at 
org.apache.flume.channel.MemoryChannel$MemoryTransaction.doCommit(MemoryChannel.java:128)
        at 
org.apache.flume.channel.BasicTransactionSemantics.commit(BasicTransactionSemantics.java:151)
        at 
org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:192)
        ... 10 more
19 Nov 2013 12:32:35,008 INFO  [Log-BackgroundWorker-events] 
(org.apache.flume.channel.file.EventQueueBackingStoreFile.beginCheckpoint:214)  
- Start checkpoint for /data/flumeng/checkpoint/checkpoint, elements to sync = 
16
{code}

Am I missing something, or adding some kind of throttling in the loop that 
consume events from files and let user manage the thoughtput would be a good 
solution ? 

Following your advices I'am available for submitting a patch.




--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to