Re: Flume scalability & performance

alo alt Fri, 20 Apr 2012 00:18:51 -0700

Hi,

inline.


--
Alexander Lorenz
http://mapredit.blogspot.com

On Apr 20, 2012, at 6:22 AM, M. Karthikeyan wrote:

> Thanks Brock for your thoughts.
> A few related questions:
> 1) Is there an out-of-the-box flume source that can monitor a RDBMS and pick 
> new rows from there, similar to a tailf on a file?

No, here you have to write your own decorator. Here a list of implementations I 
found over the time:
https://github.com/figarocms/flume-plugins
https://github.com/stampy88/flume-amqp-plugin
https://github.com/thobbs/flume-cassandra-plugin


> 2) For systems that do not want to persist data into secondary storage, does 
> flume provide an API for direct integration into the app generating the data? 
> I guess the answer should be yes and in that case, is the app considered a 
> flume agent or the app generates data in a form that can be consumed by 
> another flume agent? 
> 

Again, here you have to write your own sink, but yes. 

I was reading all your requirements, I would really push you to flumeNG. A 
short writeup I was done in my blog:
http://mapredit.blogspot.de/2012/03/flumeng-evolution.html

- Alex


> Thanks & Regards,
> MK
> 
> KARTHIKEYAN M  
> 
> Ericsson India Global Services Pvt.Ltd.,
> EGI/R
> `Tamarai Tech Park', 4th Floor, South Block,
> Inner Ring Road, Guindy, Chennai - 600032, India
> Phone +91 44 4501 2055
> Fax +91 44 4501 2066
> Mobile +91 96770 68559
> m.karthike...@ericsson.com
> www.ericsson.com  
> 
> 
> 
> 
> 
> 
> 
> 
> This Communication is Confidential. We only send and receive email on the 
> basis of the term set out at www.ericsson.com/email_disclaimer  
> 
> -----Original Message-----
> From: Brock Noland [mailto:br...@cloudera.com] 
> Sent: Thursday, April 19, 2012 8:50 PM
> To: flume-user@incubator.apache.org
> Subject: Re: Flume scalability & performance
> 
> One mistake below of consequence.
> 
> On Thu, Apr 19, 2012 at 2:44 PM, Brock Noland <br...@cloudera.com> wrote:
>> Hi,
>> 
>> On Thu, Apr 19, 2012 at 10:04 AM, M. Karthikeyan 
>> <m.karthike...@ericsson.com> wrote:
>>> Im trying to choose between Flume and JMS for data collection 
>>> framework in our multi-node network.
>>> I have the following questions:
>>> 1) From a scalability point of view, how does Flume compare with JMS? 
>>> Are there any numbers that can be referred to
>>> 2) My typical payload for a single message is 2 KB. I expect traffic 
>>> of approx. 50 million messages/day. The messages are usually one 
>>> sender one receiver type. I require a reasonable level of reliability 
>>> (atleast the store-and-forward mode in Flume & durable/persistent 
>>> messages in JMS). With these considerations, which will give better 
>>> performance: Flume or JMS?
>> 
>> All of this is extremely dependent on the implementation of JMS you 
>> use. JMS is a specification, there are many implementations. Looking 
>> at your numbers and assumption all the messages come in 8 hours 
>> (representing peak load) that is about 4MB/second.
>> 
>> Both Flume and most JMS implementations should be able to handle this 
>> throughput. The advantage of Flume is really configuration. Purchasing 
>> and configuring a JMS server and then writing code to interact with 
>> the JMS Server is, IMHO, going to be less work than installing and 
>> configuring Flume.
> 
> I meant to say setting up all that JMS infrastructure is going to be
> *more* work than flume.
> 
> Brock

Re: Flume scalability & performance

Reply via email to