Thanks Mike,
this is in deed very helpful!

Jarcec

On Mon, May 07, 2012 at 06:55:49PM -0700, Mike Percy wrote:
> Hi folks,
> Will McQueen and I have been doing some Flume NG stress and performance 
> testing, and we wanted to share some of our recent findings. The focus of the 
> most recent tests has been on the syslog TCP source, memory channel, and HDFS 
> sink.
> 
> I wrote some software to generate load in syslog format over TCP and to 
> automate some of the analysis. The first thing we wanted to verify is that no 
> data was lost during these tests (a.k.a. correctness), with a close second 
> priority being of course throughput (performance). I used Pig and AvroStorage 
> from piggybank in the data integrity analysis, and committed the compiled 
> (0.11 trunk) piggybank jar so the load analysis scripts would be relatively 
> easy to use. It seems to be compatible with Pig 0.8.1. I am a little wary of 
> having to maintain that type of thing at the Apache org level so for now I 
> have checked all the code in on Github under an ASL 2.0 license:
> 
> https://github.com/mpercy/flume-load-gen
> 
> I have created a Wiki page with the performance metrics we have come up with 
> so far. The executive summary is that at the time of this writing, we have 
> observed Flume NG on a single machine processing events at a throughput rate 
> of 70,000+ events/sec with no data loss.
> 
> https://cwiki.apache.org/confluence/display/FLUME/Flume+NG+Performance+Measurements
> 
> I have put more details on the wiki page itself. Please let me know if you 
> want me to add more detail. I'll be looking into improving the performance of 
> these components going forward, however we wanted to post these results to 
> set a public performance baseline of Flume NG.
> 
> If others have done performance testing, we would love to see your results if 
> you can post the details.
> 
> Regards,
> Mike
> 

Attachment: signature.asc
Description: Digital signature

Reply via email to