Re: [jira] [Created] (FLUME-1479) Multiple Sinks can connect to single Channel

Mike Percy Tue, 14 Aug 2012 20:52:59 -0700

Yongkun Wang,
You're welcome! Very happy to hear your thoughts.

Regards,
Mike


On Tue, Aug 14, 2012 at 8:03 PM, Wang, Yongkun | Yongkun | BDD <
[email protected]> wrote:

> Thanks Mike.
>
> This is really a nice reply based on the thorough understanding of my
> proposal.
>
> I agree that it might be a potential design change. So I will carefully
> evaluate it before submitting it to you guys to make the decision.
>
> Cheers,
> Yongkun Wang
>
> On 12/08/13 9:17, "Mike Percy" <[email protected]> wrote:
>
> >Hi,
> >Due to design decisions made very early on in Flume NG - specifically the
> >fact that Sink only has a simple process() method - I don't see a good way
> >to get multiple sinks pulling from the same channel in a way that is
> >backwards-compatible with the current implementation.
> >
> >Probably the "right" way to support this would be to have an interface
> >where the SinkRunner (or something outside of each Sink) is in control of
> >the transaction, and then it can easily send events to each sink serially
> >or in parallel within a single transaction. I think that is basically what
> >you are describing. If you look at SourceRunner and SourceProcessor you
> >will see similar ideas to what you are describing but they are only
> >implemented at the Source->Channel level. The current SinkProcessor is not
> >an analog of SourceProcessor, but if it was then I think that's where this
> >functionality might fit. However what happens when you do that is you have
> >to handle a ton of failure cases and threading models in a very general
> >way, which might be tough to get right for all use cases. I'm not 100%
> >sure, but I think that's why this was not pursued at the time.
> >
> >To me, this seems like a potential design change (it would have to be very
> >carefully thought out) to consider for a future major Flume code line
> >(maybe a Flume 2.x).
> >
> >By the way, if one is trying to get maximum throughput, then duplicating
> >events onto multiple channels, and having different threads running the
> >sinks (the current design) will be faster and more resilient in general
> >than a single thread and a single channel writing to multiple
> >sinks/destinations. The multiple-channel design pattern will allow
> >periodic
> >downtimes or delays on a single sink to not affect the others, assuming
> >the
> >channel sizes are large enough for buffering during downtime and assuming
> >that each sink is fast enough to recover from temporary delays. Without a
> >dedicated buffer per destination, one is at the mercy of the slowest sink
> >at every stage in the transaction.
> >
> >One last thing worth noting is that the current channels are all well
> >ordered. This means that Flume currently provides a weak ordering
> >guarantee
> >(across a single hop). That is a helpful property in the context of
> >testing
> >and validation, as well as is what many people expect if they are storing
> >logs on a single hop. I hope we don't backpedal on that weak ordering
> >guarantee without a really good reason.
> >
> >Regards,
> >Mike
> >
> >On Fri, Aug 10, 2012 at 9:30 PM, Wang, Yongkun | Yongkun | BDD <
> >[email protected]> wrote:
> >
> >> Hi Jhhani,
> >>
> >> Yes, we can use two (or several) channels to fan out data to different
> >> sinks. Then we will have two channels with same data, which may not be
> >>an
> >> optimized solution. So I want to use just ONE channel, creating a
> >> processor to pull the data once from the channel, then distributing to
> >> different sinks.
> >>
> >> Regards,
> >> Yongkun Wang
> >>
> >> On 12/08/10 18:07, "Juhani Connolly" <[email protected]>
> >> wrote:
> >>
> >> >Hi Yongkun,
> >> >
> >> >I'm curious why you need to pull the data twice from the sink? Do you
> >> >need all sinks to have read the same amount of data? Normally for the
> >> >case of splitting data into batch and analytics, we will send data from
> >> >the source to two separate channels and have the sinks read from
> >> >separate channels.
> >> >
> >> >On 08/10/2012 02:48 PM, Wang, Yongkun | Yongkun | BDD wrote:
> >> >> Hi Denny,
> >> >>
> >> >> I am working on the patch now, it's not difficult. I have listed the
> >> >> changes in that JIRA.
> >> >> I think you misunderstand my design, I didn't maintain the order of
> >>the
> >> >> events. Instead I make sure that each sink will get the same events
> >>(or
> >> >> different events specified by selector).
> >> >>
> >> >> Suppose Channel (mc) contains the following events: 4,3,2,1
> >> >>
> >> >> If simply enable it by configuration, it may work like this:
> >> >> Sink "hsa" may get 1,3;
> >> >> Sink "hsb" may get 2,4;
> >> >> So different sink will get different data. Is this what user wants?
> >> >>
> >> >>
> >> >> In my design, "hsa" and "hsb" will both get "4,3,2,1". This is a
> >>typical
> >> >> case when user want to fan-out the data into two places (eg. One for
> >> >>batch
> >> >> and and another for real-time analysis).
> >> >>
> >> >> Regards,
> >> >> Yongkun Wang
> >> >>
> >> >>
> >> >> On 12/08/10 14:29, "Denny Ye" <[email protected]> wrote:
> >> >>
> >> >>> hi Yongkun,
> >> >>>
> >> >>>    JIRA can be accessed now.
> >> >>>
> >> >>>    I think it might be difficult to understand the order of events
> >>from
> >> >>> your thought. If we don't care about the order, can discuss the
> >>value
> >> >>>and
> >> >>> feasibility.  In my opinion, data ingest flow is order unawareness,
> >>at
> >> >>> least, not such important for us. You can try to verify your
> >>proposal
> >> >>>and
> >> >>> give us result. It may be some difficulties in keeping transaction
> >>with
> >> >>> several Sinks.
> >> >>>
> >> >>> -Regards
> >> >>> Denny Ye
> >> >>>
> >> >>>
> >> >>> 2012/8/10 Wang, Yongkun | Yongkun | BDD
> >><[email protected]
> >> >
> >> >>>
> >> >>>> JIRA is down again? I cannot connect to it and comment there.
> >> >>>>
> >> >>>> I have a proposal in "Transactional Multiplex (fan out) Sink"):
> >> >>>> https://issues.apache.org/jira/browse/FLUME-1435
> >> >>>> Which contains the design of one channel to multiple sinks.
> >> >>>>
> >> >>>> You can search the email since JIRA cannot be accessed.
> >> >>>>
> >> >>>> I think this is more than a configuration issue. If simply enable
> >> >>>> several
> >> >>>> sinks on the same channel, they will take it either in a
> >>round-robin
> >> >>>> mode
> >> >>>> or in a unpredictable mode if the speed of sinks are different.
> >> >>>>
> >> >>>> So it's better to have a even higher level transaction control
> >>instead
> >> >>>> of
> >> >>>> the transaction in the process() of each sink, as I describe in
> >> >>>> FLUME-1435.
> >> >>>>
> >> >>>> Regards,
> >> >>>> Yongkun Wang
> >> >>>>
> >> >>>>
> >> >>>> On 12/08/10 12:30, "Denny Ye (JIRA)" <[email protected]> wrote:
> >> >>>>
> >> >>>>> Denny Ye created FLUME-1479:
> >> >>>>> -------------------------------
> >> >>>>>
> >> >>>>>              Summary: Multiple Sinks can connect to single Channel
> >> >>>>>                  Key: FLUME-1479
> >> >>>>>                  URL:
> >> >>>>>https://issues.apache.org/jira/browse/FLUME-1479
> >> >>>>>              Project: Flume
> >> >>>>>           Issue Type: Bug
> >> >>>>>           Components: Configuration
> >> >>>>>     Affects Versions: v1.2.0
> >> >>>>>             Reporter: Denny Ye
> >> >>>>>             Assignee: Denny Ye
> >> >>>>>              Fix For: v1.3.0
> >> >>>>>
> >> >>>>>
> >> >>>>> If we has one Channel (mc) and two Sinks (hsa, hsb), then they
> >>may be
> >> >>>>> connected with each other with configuration example
> >> >>>>> {quote}
> >> >>>>> agent.sinks.hsa.channel = mc
> >> >>>>> agent.sinks.hsb.channel = mc
> >> >>>>> {quote}
> >> >>>>> It means that there have multiple Sinks can connect to single
> >> >>>>>Channel.
> >> >>>>> Normally, one Sink only can connect to unified Channel
> >> >>>>>
> >> >>>>> --
> >> >>>>> This message is automatically generated by JIRA.
> >> >>>>> If you think it was sent incorrectly, please contact your JIRA
> >> >>>>> administrators:
> >> >>>>>
> >> >>>>>
> >> https://issues.apache.org/jira/secure/ContactAdministrators!default.js
> >> >>>>>pa
> >> >>>>> For more information on JIRA, see:
> >> >>>> http://www.atlassian.com/software/jira
> >> >>>>>
> >> >>>>>
> >> >>>>
> >> >>>>
> >> >>
> >> >>
> >> >
> >> >
> >>
> >>
> >>
>
>
>

Re: [jira] [Created] (FLUME-1479) Multiple Sinks can connect to single Channel

Reply via email to