[jira] [Commented] (FLUME-1030) Retry logic for failover sink processor to handle downstream exceptions in a predictable manner.

Hudson (Commented) (JIRA) Fri, 23 Mar 2012 10:17:54 -0700

    [ 
https://issues.apache.org/jira/browse/FLUME-1030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236798#comment-13236798
 ]


Hudson commented on FLUME-1030:
-------------------------------

Integrated in flume-trunk #141 (See 
[https://builds.apache.org/job/flume-trunk/141/])
    FLUME-1030. Retry mechanism for failover sink processor.

(Juhani Connolly via Arvind Prabhakar) (Revision 1304474)

     Result = SUCCESS
arvind : http://svn.apache.org/viewvc/?view=rev&rev=1304474
Files : 
* 
/incubator/flume/trunk/flume-ng-core/src/main/java/org/apache/flume/sink/FailoverSinkProcessor.java
* 
/incubator/flume/trunk/flume-ng-core/src/test/java/org/apache/flume/sink/TestFailoverSinkProcessor.java

                
> Retry logic for failover sink processor to handle downstream exceptions in a 
> predictable manner.
> ------------------------------------------------------------------------------------------------
>
>                 Key: FLUME-1030
>                 URL: https://issues.apache.org/jira/browse/FLUME-1030
>             Project: Flume
>          Issue Type: Improvement
>          Components: Sinks+Sources
>            Reporter: Juhani Connolly
>            Assignee: Juhani Connolly
>             Fix For: v1.2.0
>
>         Attachments: FLUME-1030.2.patch, FLUME-1030.3.patch, 
> FLUME-1030.4.patch
>
>
> One may want to refer to FLUME-984 for some history of this.
> As it stands, a sink can have several outcomes:
> - OK - succesfully transferred some data
> - TRY_LATER - no data to transfer
> - throw EventDeliveryException - Give the sink a short breather to recover, 
> then try again
> - throw anything else - get logged and more or less ignored
> I don't think the last choice in particular is a good idea as it encourages 
> throwing Sink specific exceptions. Further, there is no distinction between 
> temporary disconnectivity(e.g. HBase timed out because of a compaction or 
> something), and more permanent problems(e.g. cannot write to a file).
> One solution to this is to add a second type of exception that delivery 
> mechanisms can throw, ConnectivityException/FatalException or something 
> similar. For the purposes of any failover/load balancing mechanism this would 
> signal that a component is out of order for a more significant amount of time 
> and thus constant polling should be stopped(perhaps retry it every 5 minutes 
> instead, or have an exponentially increasing retry time).
> If adding another exception is not deemed acceptable, there is always the 
> possibility of expecting SinkProcessors to figure out if a sink is dead... 
> E.g. counting sequential failures, though I do not think this is ideal. I 
> would prefer to see a clear contract defined by SinkRunner that well behaved 
> sinks could adhere to and get the benefits of graceful temporary/longterm 
> failure from.
> If someone has other suggestions for distinguishing between temporary and 
> longer term failure please let me know. As it stands, components that are 
> unresponsive can and do get called constantly, and some components trigger 
> retries and can actually block a SinkRunner thread for a fair while.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (FLUME-1030) Retry logic for failover sink processor to handle downstream exceptions in a predictable manner.

Reply via email to