Hi Kamal, Flume's design is such that it is horizontally scalable, add more boxes and run more collector daemons. It should be able to handle 2000 messages per second. You can configure a fail over chain to avoid loss of events.
IMHO, the downside of mulch-threaded approach is lack of manageability. On Mon, Oct 17, 2011 at 9:58 PM, Kamal Bahadur <[email protected]>wrote: > Hi Dani, > > Thanks for the reply. I am using E2E relaibility mode. If I spawn new > thread for each append call, I am not sure if the acks will be handled > properly. I might lose an event if the child thread ends up in an exception. > Do you have any suggestion for my use case? With current setup, I am able to > write only 500 events per second. The expected events rate is over 2000 per > second. I tried to increase the number of collectors and it seems to help. > Is this my only option? > > Thanks, > Kamal > > > On Mon, Oct 17, 2011 at 4:42 PM, Dani Rayan <[email protected]> wrote: > >> Hey Kamal, >> >> You are correct. The append method would not spawn new threads by itself. >> However, you can still override it. >> >> >> On Mon, Oct 17, 2011 at 1:58 PM, Kamal Bahadur <[email protected]>wrote: >> >>> Hi, >>> >>> I have written a sink for writing data into Casandra using Hector API. It >>> looks like Hector does a great job of connection pooling and load balancing. >>> As soon as I start the collector, I can see 16 conections being established >>> between collector and cassandra. I am not sure if flume is taking advantage >>> of those connections in the pool. I am assuming that, Collector's append >>> method is not multi-threaded and therefore only one connection is being used >>> at any point of time. Can someone confirm this? >>> >>> Thanks, >>> Kamal >>> >> >> >> >> -- >> -Dani Abel Rayan >> > > -- -Dani Abel Rayan
