Great! Curious, did you test how the size of the payload effects performance? What if the log message was 30KB?
On Mon, May 7, 2012 at 9:55 PM, Mike Percy <mpe...@cloudera.com> wrote: > Hi folks, > Will McQueen and I have been doing some Flume NG stress and performance > testing, and we wanted to share some of our recent findings. The focus of > the most recent tests has been on the syslog TCP source, memory channel, > and HDFS sink. > > I wrote some software to generate load in syslog format over TCP and to > automate some of the analysis. The first thing we wanted to verify is that > no data was lost during these tests (a.k.a. correctness), with a close > second priority being of course throughput (performance). I used Pig and > AvroStorage from piggybank in the data integrity analysis, and committed > the compiled (0.11 trunk) piggybank jar so the load analysis scripts would > be relatively easy to use. It seems to be compatible with Pig 0.8.1. I am a > little wary of having to maintain that type of thing at the Apache org > level so for now I have checked all the code in on Github under an ASL 2.0 > license: > > https://github.com/mpercy/flume-load-gen > > I have created a Wiki page with the performance metrics we have come up > with so far. The executive summary is that at the time of this writing, we > have observed Flume NG on a single machine processing events at a > throughput rate of 70,000+ events/sec with no data loss. > > > https://cwiki.apache.org/confluence/display/FLUME/Flume+NG+Performance+Measurements > > I have put more details on the wiki page itself. Please let me know if you > want me to add more detail. I'll be looking into improving the performance > of these components going forward, however we wanted to post these results > to set a public performance baseline of Flume NG. > > If others have done performance testing, we would love to see your results > if you can post the details. > > Regards, > Mike > >