[
https://issues.apache.org/jira/browse/FLUME-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13458540#comment-13458540
]
Rudolf Rákos edited comment on FLUME-1494 at 9/19/12 9:13 PM:
--------------------------------------------------------------
I think defaulting this threshold value is not the best idea, because there
could be lots of reasons why and how long a Sink fails:
* In our cause the Avro sink can fail because of an outage of the next hop
Avro Source. (This is when we want an alert.)
* But it can fail when we just restart the node which runs the next hop Avro
Source. This can take minutes too. (Because we don't want an alert, we'll
ignore the error.)
So it would be very hard to agree on a nice default, which is good both for us
and other users of Flume.
I like Hari's idea about SinkProcessors:
* I think the DefaultSinkProcessor could be extended to catch the exceptions
from Sources and only re-throw them, when the threshold has been met. Setting
the threshold to 0 or negative could mean to always re-throw, and it could be
the default behavior. It could log swallowed exceptions as a warning, and
re-thrown exceptions will be logged as an error in the SinkRunner.
* If I'm right, the other SinkProcessors (failover and load balancing) are
handling exceptions from Sources by providing failover behavior.
was (Author: rakosrudolf):
I think defaulting this threshold value is not the best idea, because there
could be lots of reasons why and how long a Sink fails:
* In our cause the Avro sink can fail because of an outage of the next hop
Avro Source. (This is when we want an alert.)
* But it can fail when we just restart the node which runs the next hop Avro
Source. This can take minutes too. (Because we don't want an alert, we'll
ignore the error.)
So it would be very hard to agree on a nice default, which is good both for us
and other users of Flume.
I like Hari's idea about SinkProcessors:
* I think the DefaultSinkProcessor could be extended to catch the exceptions
from Sources and only re-throw them, when the threshold has been met. Setting
the threshold to 0 or negative could mean to always re-throw, and it could be
the default behavior.
* If I'm right, the other SinkProcessors (failover and load balancing) are
handling exceptions from Sources by providing failover behavior.
> Avro sink logs error messages on connection failure
> ---------------------------------------------------
>
> Key: FLUME-1494
> URL: https://issues.apache.org/jira/browse/FLUME-1494
> Project: Flume
> Issue Type: Wish
> Components: Sinks+Sources
> Affects Versions: v1.1.0, v1.2.0, v1.3.0
> Reporter: Rudolf Rákos
> Assignee: Ted Malaska
> Priority: Minor
> Original Estimate: 2h
> Remaining Estimate: 2h
>
> The Avro sink emits error level log messages every few seconds when it loses
> connection to the next hop Avro source. When the Avro sink tries to
> (re)connect to the source and fails, it logs an error message.
> Using a file channel or any persistent channel before the Avro sink
> guarantees the transactional behavior and 100% delivery. So the messages will
> be delivered when the next hop Avro source comes online.
> This error level log message is problematic for us for two reasons:
> * This is not really an error, because no message (event) is lost. The outage
> of the next hop Avro source should not be notified or handled on the Avro
> sink's side.
> * We have an alerting system which works by examining log messages (it is a
> Logback appender), and alerts our support team when errors are logged. We
> could blacklist this error message, but I don't think that is a permanent
> solution for the problem.
> Currently there is no way to configure Flume to only log a warning level log
> message or only log this error once.
> The log message:
> {quote}
> 2012-08-16 14:33:26,192 [SinkRunner-PollingRunner-DefaultSinkProcessor] ERROR
> org.apache.flume.SinkRunner - Unable to deliver event. Exception follows.
> org.apache.flume.EventDeliveryException: Failed to send events
> at org.apache.flume.sink.AvroSink.process(AvroSink.java:325)
> ~[flume-ng-core-1.2.0.jar:1.2.0]
> at
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
> ~[flume-ng-core-1.2.0.jar:1.2.0]
> at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
> ~[flume-ng-core-1.2.0.jar:1.2.0]
> at java.lang.Thread.run(Thread.java:662) [na:1.6.0_31]
> Caused by: org.apache.flume.FlumeException: NettyAvroRpcClient \{ host:
> localhost, port: 11113 }: RPC connection error
> at
> org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:117)
> ~[flume-ng-sdk-1.2.0.jar:1.2.0]
> at
> org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:93)
> ~[flume-ng-sdk-1.2.0.jar:1.2.0]
> at
> org.apache.flume.api.NettyAvroRpcClient.configure(NettyAvroRpcClient.java:507)
> ~[flume-ng-sdk-1.2.0.jar:1.2.0]
> at
> org.apache.flume.api.RpcClientFactory.getInstance(RpcClientFactory.java:88)
> ~[flume-ng-sdk-1.2.0.jar:1.2.0]
> at org.apache.flume.sink.AvroSink.createConnection(AvroSink.java:182)
> ~[flume-ng-core-1.2.0.jar:1.2.0]
> at org.apache.flume.sink.AvroSink.verifyConnection(AvroSink.java:222)
> ~[flume-ng-core-1.2.0.jar:1.2.0]
> at org.apache.flume.sink.AvroSink.process(AvroSink.java:282)
> ~[flume-ng-core-1.2.0.jar:1.2.0]
> ... 3 common frames omitted
> Caused by: java.io.IOException: Error connecting to /127.0.0.1:11113
> at org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:249)
> ~[avro-ipc-1.6.3.jar:1.6.3]
> at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:198)
> ~[avro-ipc-1.6.3.jar:1.6.3]
> at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:147)
> ~[avro-ipc-1.6.3.jar:1.6.3]
> at
> org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:106)
> ~[flume-ng-sdk-1.2.0.jar:1.2.0]
> ... 9 common frames omitted
> Caused by: java.net.ConnectException: Connection refused: no further
> information
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[na:1.6.0_31]
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
> ~[na:1.6.0_31]
> at
> org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:401)
> ~[netty-3.2.7.Final.jar:na]
> at
> org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:370)
> ~[netty-3.2.7.Final.jar:na]
> at
> org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:292)
> ~[netty-3.2.7.Final.jar:na]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> ~[na:1.6.0_31]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> ~[na:1.6.0_31]
> ... 1 common frames omitted
> {quote}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira