Thanks Mike, this is in deed very helpful! Jarcec
On Mon, May 07, 2012 at 06:55:49PM -0700, Mike Percy wrote: > Hi folks, > Will McQueen and I have been doing some Flume NG stress and performance > testing, and we wanted to share some of our recent findings. The focus of the > most recent tests has been on the syslog TCP source, memory channel, and HDFS > sink. > > I wrote some software to generate load in syslog format over TCP and to > automate some of the analysis. The first thing we wanted to verify is that no > data was lost during these tests (a.k.a. correctness), with a close second > priority being of course throughput (performance). I used Pig and AvroStorage > from piggybank in the data integrity analysis, and committed the compiled > (0.11 trunk) piggybank jar so the load analysis scripts would be relatively > easy to use. It seems to be compatible with Pig 0.8.1. I am a little wary of > having to maintain that type of thing at the Apache org level so for now I > have checked all the code in on Github under an ASL 2.0 license: > > https://github.com/mpercy/flume-load-gen > > I have created a Wiki page with the performance metrics we have come up with > so far. The executive summary is that at the time of this writing, we have > observed Flume NG on a single machine processing events at a throughput rate > of 70,000+ events/sec with no data loss. > > https://cwiki.apache.org/confluence/display/FLUME/Flume+NG+Performance+Measurements > > I have put more details on the wiki page itself. Please let me know if you > want me to add more detail. I'll be looking into improving the performance of > these components going forward, however we wanted to post these results to > set a public performance baseline of Flume NG. > > If others have done performance testing, we would love to see your results if > you can post the details. > > Regards, > Mike >
signature.asc
Description: Digital signature