[
https://issues.apache.org/jira/browse/FLUME-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453141#comment-13453141
]
Ted Malaska commented on FLUME-1494:
------------------------------------
The root of this jira is the SinkRunner.java line 160.
The question is should this be a warn or an error.
I don't have the answer since I'm new to flume. But I can see both sides.
Error side:
1. This is fact an error or an exception. Something has gone wrong.
2. Data flow has been halted for a time.
Warn side:
1. This is an error that doesn't lose data and the system recovers.
So the big question is can this error message be black listed. Well that would
be unwise. Because this error message could be telling you of a real problem.
The problem is how do we separate a hiccup from a major problem.
One possible solution would be to add a time threshold to SinkRunner. If error
continues to happen past the threshold then report an error instead of a warn.
Let me know what you think
> Avro sink logs error messages on connection failure
> ---------------------------------------------------
>
> Key: FLUME-1494
> URL: https://issues.apache.org/jira/browse/FLUME-1494
> Project: Flume
> Issue Type: Wish
> Components: Sinks+Sources
> Affects Versions: v1.1.0, v1.2.0, v1.3.0
> Reporter: Rudolf Rákos
> Priority: Minor
> Original Estimate: 2h
> Remaining Estimate: 2h
>
> The Avro sink emits error level log messages every few seconds when it loses
> connection to the next hop Avro source. When the Avro sink tries to
> (re)connect to the source and fails, it logs an error message.
> Using a file channel or any persistent channel before the Avro sink
> guarantees the transactional behavior and 100% delivery. So the messages will
> be delivered when the next hop Avro source comes online.
> This error level log message is problematic for us for two reasons:
> * This is not really an error, because no message (event) is lost. The outage
> of the next hop Avro source should not be notified or handled on the Avro
> sink's side.
> * We have an alerting system which works by examining log messages (it is a
> Logback appender), and alerts our support team when errors are logged. We
> could blacklist this error message, but I don't think that is a permanent
> solution for the problem.
> Currently there is no way to configure Flume to only log a warning level log
> message or only log this error once.
> The log message:
> {quote}
> 2012-08-16 14:33:26,192 [SinkRunner-PollingRunner-DefaultSinkProcessor] ERROR
> org.apache.flume.SinkRunner - Unable to deliver event. Exception follows.
> org.apache.flume.EventDeliveryException: Failed to send events
> at org.apache.flume.sink.AvroSink.process(AvroSink.java:325)
> ~[flume-ng-core-1.2.0.jar:1.2.0]
> at
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
> ~[flume-ng-core-1.2.0.jar:1.2.0]
> at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
> ~[flume-ng-core-1.2.0.jar:1.2.0]
> at java.lang.Thread.run(Thread.java:662) [na:1.6.0_31]
> Caused by: org.apache.flume.FlumeException: NettyAvroRpcClient \{ host:
> localhost, port: 11113 }: RPC connection error
> at
> org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:117)
> ~[flume-ng-sdk-1.2.0.jar:1.2.0]
> at
> org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:93)
> ~[flume-ng-sdk-1.2.0.jar:1.2.0]
> at
> org.apache.flume.api.NettyAvroRpcClient.configure(NettyAvroRpcClient.java:507)
> ~[flume-ng-sdk-1.2.0.jar:1.2.0]
> at
> org.apache.flume.api.RpcClientFactory.getInstance(RpcClientFactory.java:88)
> ~[flume-ng-sdk-1.2.0.jar:1.2.0]
> at org.apache.flume.sink.AvroSink.createConnection(AvroSink.java:182)
> ~[flume-ng-core-1.2.0.jar:1.2.0]
> at org.apache.flume.sink.AvroSink.verifyConnection(AvroSink.java:222)
> ~[flume-ng-core-1.2.0.jar:1.2.0]
> at org.apache.flume.sink.AvroSink.process(AvroSink.java:282)
> ~[flume-ng-core-1.2.0.jar:1.2.0]
> ... 3 common frames omitted
> Caused by: java.io.IOException: Error connecting to /127.0.0.1:11113
> at org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:249)
> ~[avro-ipc-1.6.3.jar:1.6.3]
> at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:198)
> ~[avro-ipc-1.6.3.jar:1.6.3]
> at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:147)
> ~[avro-ipc-1.6.3.jar:1.6.3]
> at
> org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:106)
> ~[flume-ng-sdk-1.2.0.jar:1.2.0]
> ... 9 common frames omitted
> Caused by: java.net.ConnectException: Connection refused: no further
> information
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[na:1.6.0_31]
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
> ~[na:1.6.0_31]
> at
> org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:401)
> ~[netty-3.2.7.Final.jar:na]
> at
> org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:370)
> ~[netty-3.2.7.Final.jar:na]
> at
> org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:292)
> ~[netty-3.2.7.Final.jar:na]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> ~[na:1.6.0_31]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> ~[na:1.6.0_31]
> ... 1 common frames omitted
> {quote}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira