[ 
https://issues.apache.org/jira/browse/FLUME-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14319393#comment-14319393
 ] 

Roshan Naik commented on FLUME-2375:
------------------------------------

FLUME-2451 has a patch that should fix this.


> HDFS sink's fail to recover from datanode unavailability
> --------------------------------------------------------
>
>                 Key: FLUME-2375
>                 URL: https://issues.apache.org/jira/browse/FLUME-2375
>             Project: Flume
>          Issue Type: Bug
>    Affects Versions: v1.4.0
>            Reporter: David Stendardi
>              Labels: hdfs, hdfssink
>
> Hello !
> We are running flume-ng with version cdh-4.5-1.4. When a datanode used by 
> flume-ng goes done, we get the following exceptions :  
> {code}
> 30 Apr 2014 01:10:38,130 ERROR 
> [SinkRunner-PollingRunner-DefaultSinkProcessor] 
> (org.apache.flume.sink.hdfs.AbstractHDFSWriter.isUnderReplicated:96)  - 
> Unexpected error while checking replication factor
> java.lang.reflect.InvocationTargetException
>         at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at 
> org.apache.flume.sink.hdfs.AbstractHDFSWriter.getNumCurrentReplicas(AbstractHDFSWriter.java:162)
>         at 
> org.apache.flume.sink.hdfs.AbstractHDFSWriter.isUnderReplicated(AbstractHDFSWriter.java:82)
>         at 
> org.apache.flume.sink.hdfs.BucketWriter.shouldRotate(BucketWriter.java:452)
>         at 
> org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:387)
>         at 
> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:392)
>         at 
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>         at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>         at java.lang.Thread.run(Thread.java:662)
> {code}
> These exceptions are logged but not rethrown, and the 
> AbstractHdfsSink::isUnderReplicated still returns false so the writer 
> continue to try writing on the node.
> Here is how we configured our sink : 
> {code}
> collector.sinks.hdfs.channel = hdfs
> collector.sinks.hdfs.type = hdfs
> collector.sinks.hdfs.hdfs.path = 
> /flume-ng/%{env}/%{avro.fqn}/from_year=%Y/from_date=%Y-%m-%d
> collector.sinks.hdfs.hdfs.filePrefix = <%= @hostname %>-%H-%{avro.fp}
> collector.sinks.hdfs.hdfs.fileSuffix = .avro
> collector.sinks.hdfs.hdfs.rollInterval = 3605
> collector.sinks.hdfs.hdfs.rollSize = 0
> collector.sinks.hdfs.hdfs.rollCount = 0
> collector.sinks.hdfs.hdfs.batchSize = 1000
> collector.sinks.hdfs.hdfs.txnEventMax = 1000
> collector.sinks.hdfs.hdfs.callTimeout = 20000
> collector.sinks.hdfs.hdfs.fileType = DataStream
> collector.sinks.hdfs.serializer = 
> com.viadeo.event.flume.serializer.AvroEventSerializer$Builder
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to