Great!

Curious, did you test how the size of the payload effects performance?
 What if the log message was 30KB?

On Mon, May 7, 2012 at 9:55 PM, Mike Percy <mpe...@cloudera.com> wrote:

> Hi folks,
> Will McQueen and I have been doing some Flume NG stress and performance
> testing, and we wanted to share some of our recent findings. The focus of
> the most recent tests has been on the syslog TCP source, memory channel,
> and HDFS sink.
>
> I wrote some software to generate load in syslog format over TCP and to
> automate some of the analysis. The first thing we wanted to verify is that
> no data was lost during these tests (a.k.a. correctness), with a close
> second priority being of course throughput (performance). I used Pig and
> AvroStorage from piggybank in the data integrity analysis, and committed
> the compiled (0.11 trunk) piggybank jar so the load analysis scripts would
> be relatively easy to use. It seems to be compatible with Pig 0.8.1. I am a
> little wary of having to maintain that type of thing at the Apache org
> level so for now I have checked all the code in on Github under an ASL 2.0
> license:
>
> https://github.com/mpercy/flume-load-gen
>
> I have created a Wiki page with the performance metrics we have come up
> with so far. The executive summary is that at the time of this writing, we
> have observed Flume NG on a single machine processing events at a
> throughput rate of 70,000+ events/sec with no data loss.
>
>
> https://cwiki.apache.org/confluence/display/FLUME/Flume+NG+Performance+Measurements
>
> I have put more details on the wiki page itself. Please let me know if you
> want me to add more detail. I'll be looking into improving the performance
> of these components going forward, however we wanted to post these results
> to set a public performance baseline of Flume NG.
>
> If others have done performance testing, we would love to see your results
> if you can post the details.
>
> Regards,
> Mike
>
>

Reply via email to