[
https://issues.apache.org/jira/browse/FLUME-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14342228#comment-14342228
]
Johny Rufus commented on FLUME-2429:
------------------------------------
[~jaoo62], having a short callTimeout, results in a scenario where the HDFS
cluster does not complete the call in the time for which the HDFS sink in
Flume waits for the call to complete. In this case, Flume retries the entire
transaction, and events that were written as part of the previous failed
transaction, are again written to HDFS as part of the retried transaction. That
is why the timeout value should be good enough to handle the
performance/current limits of your HDFS cluster
> Callable timed out in HDFS sink
> -------------------------------
>
> Key: FLUME-2429
> URL: https://issues.apache.org/jira/browse/FLUME-2429
> Project: Flume
> Issue Type: Bug
> Affects Versions: v1.4.0
> Reporter: Jay
>
> Hi.
> I got a warning msg using HDFS sink.
> AVRO source > Memory (or File) channel > HDFS sink
> Switching channel type didn't solve the problem.
> Error occurs once a day or several days.
> Any Solution?
> Here is my configuration.
> --------------------------------------------------------------------
> testAgent.sources = testSrc
> testAgent.channels = testChannel
> testAgent.sinks = testSink
> testAgent.sources.testSrc.type = avro
> testAgent.sources.testSrc.channels = testChannel
> testAgent.channels.testChannel.type = memory
> testAgent.sources.testSrc.bind = 0.0.0.0
> testAgent.sources.testSrc.port = 4141
> testAgent.sinks.testSink.type = hdfs
> testAgent.sinks.testSink.channel = testChannel
> testAgent.sources.testSrc.interceptors = testInterceptor
> testAgent.sources.testSrc.interceptors.testInterceptor.type = static
> testAgent.sources.testSrc.interceptors.testInterceptor.preserveExisting = true
> testAgent.sources.testSrc.interceptors.testInterceptor.key = testKey
> testAgent.sources.testSrc.interceptors.testInterceptor.value = .testfile
> testAgent.sinks.testSink.hdfs.path = hdfs://hadoop-cluster:8020/flume/%Y%m%d
> testAgent.sinks.testSink.hdfs.filePrefix = %Y%m%d%H%M
> testAgent.sinks.testSink.hdfs.fileSuffix = .testfile
> testAgent.sinks.testSink.hdfs.fileType = DataStream
> testAgent.sinks.testSink.hdfs.rollInterval = 1
> testAgent.sinks.testSink.hdfs.rollCount = 0
> testAgent.sinks.testSink.hdfs.rollSize = 0
> testAgent.sinks.testSink.hdfs.batchSize = 150000
> testAgent.sinks.testSink.hdfs.callTimeout = 15000
> testAgent.sinks.testSink.hdfs.useLocalTimeStamp = true
> testAgent.sinks.testSink.serializer = text
> testAgent.sinks.testSink.serializer.appendNewline = false
> testAgent.channels.testChannel.keep-alive = 1
> testAgent.channels.testChannel.write-timeout = 1
> testAgent.channels.testChannel.transactionCapacity = 150000
> testAgent.channels.testChannel.capacity = 18000000
> #testAgent.channels.testChannel.checkpointDir = /data/flumedata/checkpoint
> #testAgent.channels.testChannel.useDualCheckpoints = true
> #testAgent.channels.testChannel.backupCheckpointDir =
> /data/flumedata_backup/checkpoint
> #testAgent.channels.testChannel.dataDirs = /data/flumedata/data
> testAgent.channels.testChannel.byteCapacityBufferPercentage = 20
> testAgent.channels.testChannel.byteCapacity = 1000000000
> --------------------------------------------------------------------
> I sometimes get a warning message in a flume log.
> --------------------------------------------------------------------
> 2014-07-22 16:28:20,186 (SinkRunner-PollingRunner-DefaultSinkProcessor) [WARN
> - org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:477)]
> Caught IOException writing to HDFSWriter (Callable timed out after 15000 ms
> on file:
> hdfs://hadoop-cluster:8020/flume/20140722/201407221628.1406014084417.testfile.tmp).
> Closing file
> (hdfs://hadoop-cluster:8020/flume/20140722/201407221628.1406014084417.testfile.tmp)
> and rethrowing exception.
> 2014-07-22 16:28:35,187 (SinkRunner-PollingRunner-DefaultSinkProcessor) [WARN
> - org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:483)]
> Caught IOException while closing file
> (hdfs://hadoop-cluster:8020/flume/20140722/201407221628.1406014084417.testfile.tmp).
> Exception follows.
> java.io.IOException: Callable timed out after 15000 ms on file:
> hdfs://search-hdanal-cluster:8020/flume/20140722/201407221628.1406014084417.testfile.tmp
> at
> org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:603)
> at
> org.apache.flume.sink.hdfs.BucketWriter.doFlush(BucketWriter.java:381)
> at
> org.apache.flume.sink.hdfs.BucketWriter.flush(BucketWriter.java:343)
> at
> org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:292)
> at
> org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:481)
> at
> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:401)
> at
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
> at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.util.concurrent.TimeoutException
> at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:228)
> at java.util.concurrent.FutureTask.get(FutureTask.java:91)
> at
> org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:596)
> ... 8 more
> 2014-07-22 16:28:35,187 (SinkRunner-PollingRunner-DefaultSinkProcessor) [WARN
> - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:438)]
> HDFS IO error
> java.io.IOException: Callable timed out after 15000 ms on file:
> hdfs://hadoop-cluster:8020/flume/20140722/201407221628.1406014084417.testfile.tmp
> at
> org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:603)
> at
> org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:469)
> at
> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:401)
> at
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
> at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.util.concurrent.TimeoutException
> at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:228)
> at java.util.concurrent.FutureTask.get(FutureTask.java:91)
> at
> org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:596)
> ... 5 more
> --------------------------------------------------------------------
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)