Thanks Mike.

This is really a nice reply based on the thorough understanding of my
proposal.

I agree that it might be a potential design change. So I will carefully
evaluate it before submitting it to you guys to make the decision.

Cheers,
Yongkun Wang

On 12/08/13 9:17, "Mike Percy" <[email protected]> wrote:

>Hi,
>Due to design decisions made very early on in Flume NG - specifically the
>fact that Sink only has a simple process() method - I don't see a good way
>to get multiple sinks pulling from the same channel in a way that is
>backwards-compatible with the current implementation.
>
>Probably the "right" way to support this would be to have an interface
>where the SinkRunner (or something outside of each Sink) is in control of
>the transaction, and then it can easily send events to each sink serially
>or in parallel within a single transaction. I think that is basically what
>you are describing. If you look at SourceRunner and SourceProcessor you
>will see similar ideas to what you are describing but they are only
>implemented at the Source->Channel level. The current SinkProcessor is not
>an analog of SourceProcessor, but if it was then I think that's where this
>functionality might fit. However what happens when you do that is you have
>to handle a ton of failure cases and threading models in a very general
>way, which might be tough to get right for all use cases. I'm not 100%
>sure, but I think that's why this was not pursued at the time.
>
>To me, this seems like a potential design change (it would have to be very
>carefully thought out) to consider for a future major Flume code line
>(maybe a Flume 2.x).
>
>By the way, if one is trying to get maximum throughput, then duplicating
>events onto multiple channels, and having different threads running the
>sinks (the current design) will be faster and more resilient in general
>than a single thread and a single channel writing to multiple
>sinks/destinations. The multiple-channel design pattern will allow
>periodic
>downtimes or delays on a single sink to not affect the others, assuming
>the
>channel sizes are large enough for buffering during downtime and assuming
>that each sink is fast enough to recover from temporary delays. Without a
>dedicated buffer per destination, one is at the mercy of the slowest sink
>at every stage in the transaction.
>
>One last thing worth noting is that the current channels are all well
>ordered. This means that Flume currently provides a weak ordering
>guarantee
>(across a single hop). That is a helpful property in the context of
>testing
>and validation, as well as is what many people expect if they are storing
>logs on a single hop. I hope we don't backpedal on that weak ordering
>guarantee without a really good reason.
>
>Regards,
>Mike
>
>On Fri, Aug 10, 2012 at 9:30 PM, Wang, Yongkun | Yongkun | BDD <
>[email protected]> wrote:
>
>> Hi Jhhani,
>>
>> Yes, we can use two (or several) channels to fan out data to different
>> sinks. Then we will have two channels with same data, which may not be
>>an
>> optimized solution. So I want to use just ONE channel, creating a
>> processor to pull the data once from the channel, then distributing to
>> different sinks.
>>
>> Regards,
>> Yongkun Wang
>>
>> On 12/08/10 18:07, "Juhani Connolly" <[email protected]>
>> wrote:
>>
>> >Hi Yongkun,
>> >
>> >I'm curious why you need to pull the data twice from the sink? Do you
>> >need all sinks to have read the same amount of data? Normally for the
>> >case of splitting data into batch and analytics, we will send data from
>> >the source to two separate channels and have the sinks read from
>> >separate channels.
>> >
>> >On 08/10/2012 02:48 PM, Wang, Yongkun | Yongkun | BDD wrote:
>> >> Hi Denny,
>> >>
>> >> I am working on the patch now, it's not difficult. I have listed the
>> >> changes in that JIRA.
>> >> I think you misunderstand my design, I didn't maintain the order of
>>the
>> >> events. Instead I make sure that each sink will get the same events
>>(or
>> >> different events specified by selector).
>> >>
>> >> Suppose Channel (mc) contains the following events: 4,3,2,1
>> >>
>> >> If simply enable it by configuration, it may work like this:
>> >> Sink "hsa" may get 1,3;
>> >> Sink "hsb" may get 2,4;
>> >> So different sink will get different data. Is this what user wants?
>> >>
>> >>
>> >> In my design, "hsa" and "hsb" will both get "4,3,2,1". This is a
>>typical
>> >> case when user want to fan-out the data into two places (eg. One for
>> >>batch
>> >> and and another for real-time analysis).
>> >>
>> >> Regards,
>> >> Yongkun Wang
>> >>
>> >>
>> >> On 12/08/10 14:29, "Denny Ye" <[email protected]> wrote:
>> >>
>> >>> hi Yongkun,
>> >>>
>> >>>    JIRA can be accessed now.
>> >>>
>> >>>    I think it might be difficult to understand the order of events
>>from
>> >>> your thought. If we don't care about the order, can discuss the
>>value
>> >>>and
>> >>> feasibility.  In my opinion, data ingest flow is order unawareness,
>>at
>> >>> least, not such important for us. You can try to verify your
>>proposal
>> >>>and
>> >>> give us result. It may be some difficulties in keeping transaction
>>with
>> >>> several Sinks.
>> >>>
>> >>> -Regards
>> >>> Denny Ye
>> >>>
>> >>>
>> >>> 2012/8/10 Wang, Yongkun | Yongkun | BDD
>><[email protected]
>> >
>> >>>
>> >>>> JIRA is down again? I cannot connect to it and comment there.
>> >>>>
>> >>>> I have a proposal in "Transactional Multiplex (fan out) Sink"):
>> >>>> https://issues.apache.org/jira/browse/FLUME-1435
>> >>>> Which contains the design of one channel to multiple sinks.
>> >>>>
>> >>>> You can search the email since JIRA cannot be accessed.
>> >>>>
>> >>>> I think this is more than a configuration issue. If simply enable
>> >>>> several
>> >>>> sinks on the same channel, they will take it either in a
>>round-robin
>> >>>> mode
>> >>>> or in a unpredictable mode if the speed of sinks are different.
>> >>>>
>> >>>> So it's better to have a even higher level transaction control
>>instead
>> >>>> of
>> >>>> the transaction in the process() of each sink, as I describe in
>> >>>> FLUME-1435.
>> >>>>
>> >>>> Regards,
>> >>>> Yongkun Wang
>> >>>>
>> >>>>
>> >>>> On 12/08/10 12:30, "Denny Ye (JIRA)" <[email protected]> wrote:
>> >>>>
>> >>>>> Denny Ye created FLUME-1479:
>> >>>>> -------------------------------
>> >>>>>
>> >>>>>              Summary: Multiple Sinks can connect to single Channel
>> >>>>>                  Key: FLUME-1479
>> >>>>>                  URL:
>> >>>>>https://issues.apache.org/jira/browse/FLUME-1479
>> >>>>>              Project: Flume
>> >>>>>           Issue Type: Bug
>> >>>>>           Components: Configuration
>> >>>>>     Affects Versions: v1.2.0
>> >>>>>             Reporter: Denny Ye
>> >>>>>             Assignee: Denny Ye
>> >>>>>              Fix For: v1.3.0
>> >>>>>
>> >>>>>
>> >>>>> If we has one Channel (mc) and two Sinks (hsa, hsb), then they
>>may be
>> >>>>> connected with each other with configuration example
>> >>>>> {quote}
>> >>>>> agent.sinks.hsa.channel = mc
>> >>>>> agent.sinks.hsb.channel = mc
>> >>>>> {quote}
>> >>>>> It means that there have multiple Sinks can connect to single
>> >>>>>Channel.
>> >>>>> Normally, one Sink only can connect to unified Channel
>> >>>>>
>> >>>>> --
>> >>>>> This message is automatically generated by JIRA.
>> >>>>> If you think it was sent incorrectly, please contact your JIRA
>> >>>>> administrators:
>> >>>>>
>> >>>>>
>> https://issues.apache.org/jira/secure/ContactAdministrators!default.js
>> >>>>>pa
>> >>>>> For more information on JIRA, see:
>> >>>> http://www.atlassian.com/software/jira
>> >>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>
>> >>
>> >
>> >
>>
>>
>>


Reply via email to