[jira] [Commented] (FLUME-952) Modifying SinkRunner to be pluggable to allow for failover/replication.

Juhani Connolly (Commented) (JIRA) Fri, 10 Feb 2012 00:50:40 -0800

    [ 
https://issues.apache.org/jira/browse/FLUME-952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205294#comment-13205294
 ]


Juhani Connolly commented on FLUME-952:
---------------------------------------

bq - I have an implicit way of doing so in the design. If 
SinkRunner.chooseSink(int try) will be executed with parameter try > 1, it 
means that previously returned sink has failed and we need another working 
sink. Consider following example. Failover sink is keeping track of active 
sink. It will return this active sink for call chooseSink(1). However calling 
chooseSink(2) means that active sink is not working and we need move to another 
active sink. However I don't mind adding explicit method for marking some sink 
as dead. It was just an idea. 

This is workable but feels unwieldy, no need for the extra state imo, I think 
it is more transparent to give selector developers a function in the interface 
they need to implement.

bq - I also found out that most sinks do not return internal state, so that you 
have to try them to find actual state. That is actually the reason why I 
suggested to put a loop into SinkRunner.PollingRunner.run to keep executing 
SinkRunner.chooseSink(int try) until it returns null with increasing try 
parameter. This way selector can simply force Runner to try some previously 
dead sink and verify that is still dead.

I'm not sure if we can just keep beating on dead sinks every time we want to 
see if they're back yet... Some of them block for a short while trying to send 
a message, and I don't think we can just keep hammering them every failed 
message... What I would have liked to do is keep a list of dead sinks, and 
start up another thread that could periodically poll them for  recovery. One 
possibility  is that failed sinks should change their lifecycle state to 
stopped and that all sinks would be required to make some kind of liveliness 
check when starting. Right now even if a sink returns from start() without a 
problem, many of them can fail on the very first process

One thing I am sure of is that right now sink implementations are inconsistent 
with one another, and there is no unified way of knowing when they have 
died(some of them never throw EventDeliveryException) or when they are working 
properly. I think any implementation of the selector  will have to make some 
assumptions about their behavior and then that behavior will need to be 
enforced. For me right now those assumptions could be:
- EventDeliveryException getting thrown signals failure
- New status flag for sinks, or poll function, or return value on start
                
> Modifying SinkRunner to be pluggable to allow for failover/replication.
> -----------------------------------------------------------------------
>
>                 Key: FLUME-952
>                 URL: https://issues.apache.org/jira/browse/FLUME-952
>             Project: Flume
>          Issue Type: Brainstorming
>          Components: Sinks+Sources
>            Reporter: Juhani Connolly
>             Fix For: v1.1.0
>
>
> Implementing the failover sink runner the following was suggested:
> 1. This needs to be implemented on top of FLUME-949 which deals with removing 
> the notion of a PollableSink altogether. As a result, the SinkRunner will 
> become a concrete implementation that can then allow different sink handling 
> policies - such as either a failover policy (needed for this issue), or load 
> balancing policy (not needed for this issue). Hence the policy part needs to 
> be pluggable rather than the sink runner itself. An example of such a 
> construct is the ChannelSelector and ChannelProcessor implementations.
> In Flume-865 I have implemented FailoverSinkRunner as a separate runner, but 
> I am open to the idea of making it pluggable if it makes the code more 
> maintainable.
> As is, there are many differences between the requirements for Failover and a 
> normal Sink runner, including configuration, initialisation, shutdown, error 
> handling and event processing. If we were to make this pluggable, many hooks 
> would be needed and I don't think there is that much common behavior that 
> warrants using a pluggable system rather than just a solid base class.
> - Adding a new sink to a runner, with configuration variables(such as 
> priority or weight)
> - Policy for handling process: should this just return a list of sinks to 
> process like ChannelSelector and hand off the processing to Process? I think 
> that the specific failover policy for each type of runner  will be different 
> so this feels awkward. I would personally prefer to just pass the process 
> call to the pluggable component and let it be responsible for calling process 
> on the correct sinks, as well as handling errors.
> Right now I am not convinced for the need to make SinkRunner pluggable, but I 
> would be interested to hear other peoples  opinions

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (FLUME-952) Modifying SinkRunner to be pluggable to allow for failover/replication.

Reply via email to