Yongkun Wang, You're welcome! Very happy to hear your thoughts. Regards, Mike
On Tue, Aug 14, 2012 at 8:03 PM, Wang, Yongkun | Yongkun | BDD < [email protected]> wrote: > Thanks Mike. > > This is really a nice reply based on the thorough understanding of my > proposal. > > I agree that it might be a potential design change. So I will carefully > evaluate it before submitting it to you guys to make the decision. > > Cheers, > Yongkun Wang > > On 12/08/13 9:17, "Mike Percy" <[email protected]> wrote: > > >Hi, > >Due to design decisions made very early on in Flume NG - specifically the > >fact that Sink only has a simple process() method - I don't see a good way > >to get multiple sinks pulling from the same channel in a way that is > >backwards-compatible with the current implementation. > > > >Probably the "right" way to support this would be to have an interface > >where the SinkRunner (or something outside of each Sink) is in control of > >the transaction, and then it can easily send events to each sink serially > >or in parallel within a single transaction. I think that is basically what > >you are describing. If you look at SourceRunner and SourceProcessor you > >will see similar ideas to what you are describing but they are only > >implemented at the Source->Channel level. The current SinkProcessor is not > >an analog of SourceProcessor, but if it was then I think that's where this > >functionality might fit. However what happens when you do that is you have > >to handle a ton of failure cases and threading models in a very general > >way, which might be tough to get right for all use cases. I'm not 100% > >sure, but I think that's why this was not pursued at the time. > > > >To me, this seems like a potential design change (it would have to be very > >carefully thought out) to consider for a future major Flume code line > >(maybe a Flume 2.x). > > > >By the way, if one is trying to get maximum throughput, then duplicating > >events onto multiple channels, and having different threads running the > >sinks (the current design) will be faster and more resilient in general > >than a single thread and a single channel writing to multiple > >sinks/destinations. The multiple-channel design pattern will allow > >periodic > >downtimes or delays on a single sink to not affect the others, assuming > >the > >channel sizes are large enough for buffering during downtime and assuming > >that each sink is fast enough to recover from temporary delays. Without a > >dedicated buffer per destination, one is at the mercy of the slowest sink > >at every stage in the transaction. > > > >One last thing worth noting is that the current channels are all well > >ordered. This means that Flume currently provides a weak ordering > >guarantee > >(across a single hop). That is a helpful property in the context of > >testing > >and validation, as well as is what many people expect if they are storing > >logs on a single hop. I hope we don't backpedal on that weak ordering > >guarantee without a really good reason. > > > >Regards, > >Mike > > > >On Fri, Aug 10, 2012 at 9:30 PM, Wang, Yongkun | Yongkun | BDD < > >[email protected]> wrote: > > > >> Hi Jhhani, > >> > >> Yes, we can use two (or several) channels to fan out data to different > >> sinks. Then we will have two channels with same data, which may not be > >>an > >> optimized solution. So I want to use just ONE channel, creating a > >> processor to pull the data once from the channel, then distributing to > >> different sinks. > >> > >> Regards, > >> Yongkun Wang > >> > >> On 12/08/10 18:07, "Juhani Connolly" <[email protected]> > >> wrote: > >> > >> >Hi Yongkun, > >> > > >> >I'm curious why you need to pull the data twice from the sink? Do you > >> >need all sinks to have read the same amount of data? Normally for the > >> >case of splitting data into batch and analytics, we will send data from > >> >the source to two separate channels and have the sinks read from > >> >separate channels. > >> > > >> >On 08/10/2012 02:48 PM, Wang, Yongkun | Yongkun | BDD wrote: > >> >> Hi Denny, > >> >> > >> >> I am working on the patch now, it's not difficult. I have listed the > >> >> changes in that JIRA. > >> >> I think you misunderstand my design, I didn't maintain the order of > >>the > >> >> events. Instead I make sure that each sink will get the same events > >>(or > >> >> different events specified by selector). > >> >> > >> >> Suppose Channel (mc) contains the following events: 4,3,2,1 > >> >> > >> >> If simply enable it by configuration, it may work like this: > >> >> Sink "hsa" may get 1,3; > >> >> Sink "hsb" may get 2,4; > >> >> So different sink will get different data. Is this what user wants? > >> >> > >> >> > >> >> In my design, "hsa" and "hsb" will both get "4,3,2,1". This is a > >>typical > >> >> case when user want to fan-out the data into two places (eg. One for > >> >>batch > >> >> and and another for real-time analysis). > >> >> > >> >> Regards, > >> >> Yongkun Wang > >> >> > >> >> > >> >> On 12/08/10 14:29, "Denny Ye" <[email protected]> wrote: > >> >> > >> >>> hi Yongkun, > >> >>> > >> >>> JIRA can be accessed now. > >> >>> > >> >>> I think it might be difficult to understand the order of events > >>from > >> >>> your thought. If we don't care about the order, can discuss the > >>value > >> >>>and > >> >>> feasibility. In my opinion, data ingest flow is order unawareness, > >>at > >> >>> least, not such important for us. You can try to verify your > >>proposal > >> >>>and > >> >>> give us result. It may be some difficulties in keeping transaction > >>with > >> >>> several Sinks. > >> >>> > >> >>> -Regards > >> >>> Denny Ye > >> >>> > >> >>> > >> >>> 2012/8/10 Wang, Yongkun | Yongkun | BDD > >><[email protected] > >> > > >> >>> > >> >>>> JIRA is down again? I cannot connect to it and comment there. > >> >>>> > >> >>>> I have a proposal in "Transactional Multiplex (fan out) Sink"): > >> >>>> https://issues.apache.org/jira/browse/FLUME-1435 > >> >>>> Which contains the design of one channel to multiple sinks. > >> >>>> > >> >>>> You can search the email since JIRA cannot be accessed. > >> >>>> > >> >>>> I think this is more than a configuration issue. If simply enable > >> >>>> several > >> >>>> sinks on the same channel, they will take it either in a > >>round-robin > >> >>>> mode > >> >>>> or in a unpredictable mode if the speed of sinks are different. > >> >>>> > >> >>>> So it's better to have a even higher level transaction control > >>instead > >> >>>> of > >> >>>> the transaction in the process() of each sink, as I describe in > >> >>>> FLUME-1435. > >> >>>> > >> >>>> Regards, > >> >>>> Yongkun Wang > >> >>>> > >> >>>> > >> >>>> On 12/08/10 12:30, "Denny Ye (JIRA)" <[email protected]> wrote: > >> >>>> > >> >>>>> Denny Ye created FLUME-1479: > >> >>>>> ------------------------------- > >> >>>>> > >> >>>>> Summary: Multiple Sinks can connect to single Channel > >> >>>>> Key: FLUME-1479 > >> >>>>> URL: > >> >>>>>https://issues.apache.org/jira/browse/FLUME-1479 > >> >>>>> Project: Flume > >> >>>>> Issue Type: Bug > >> >>>>> Components: Configuration > >> >>>>> Affects Versions: v1.2.0 > >> >>>>> Reporter: Denny Ye > >> >>>>> Assignee: Denny Ye > >> >>>>> Fix For: v1.3.0 > >> >>>>> > >> >>>>> > >> >>>>> If we has one Channel (mc) and two Sinks (hsa, hsb), then they > >>may be > >> >>>>> connected with each other with configuration example > >> >>>>> {quote} > >> >>>>> agent.sinks.hsa.channel = mc > >> >>>>> agent.sinks.hsb.channel = mc > >> >>>>> {quote} > >> >>>>> It means that there have multiple Sinks can connect to single > >> >>>>>Channel. > >> >>>>> Normally, one Sink only can connect to unified Channel > >> >>>>> > >> >>>>> -- > >> >>>>> This message is automatically generated by JIRA. > >> >>>>> If you think it was sent incorrectly, please contact your JIRA > >> >>>>> administrators: > >> >>>>> > >> >>>>> > >> https://issues.apache.org/jira/secure/ContactAdministrators!default.js > >> >>>>>pa > >> >>>>> For more information on JIRA, see: > >> >>>> http://www.atlassian.com/software/jira > >> >>>>> > >> >>>>> > >> >>>> > >> >>>> > >> >> > >> >> > >> > > >> > > >> > >> > >> > > >
