Hi Ari, Based on what I saw in my tests, the channel was not the bottleneck. The parsing of the syslog events in the source was the first bottleneck, after that was the HDFS sink. The performance on each of these components can easily be increased. The memory channel is quite fast (don't quote me on this but I think I've seen it do 200K events/sec on a single channel with sequence generator and null sink; it's easy to try that out).
I don't have specific plans or a schedule right now regarding testing FileChannel throughput, though that should certainly be done. Luckily, I have posted the code so you can easily take it and run some benchmarks with your favorite configuration! =) Mike On May 7, 2012, at 7:17 PM, Flinkster wrote: > Hi Mike, thanks a lot for posting the performance metrics. Are you > doing the same for file channel? > > Based on: > > Load: 58,582 events/sec aggregate == approx. 5,850 events/sec per flow > on average x 10 flows. > Event size: 300 bytes/event. > > 58,582 events/sec * 300 bytes/event = 17,574,600 bytes/sec = 17.5 > MB/sec = 140 Mbps > > Is my math right? I assume the file based channel would be slower, any > idea how much slower? > > Thanks, > > -ari > > > On Mon, May 7, 2012 at 6:55 PM, Mike Percy <[email protected]> wrote: >> Hi folks, >> Will McQueen and I have been doing some Flume NG stress and performance >> testing, and we wanted to share some of our recent findings. The focus of >> the most recent tests has been on the syslog TCP source, memory channel, and >> HDFS sink. >> >> I wrote some software to generate load in syslog format over TCP and to >> automate some of the analysis. The first thing we wanted to verify is that >> no data was lost during these tests (a.k.a. correctness), with a close >> second priority being of course throughput (performance). I used Pig and >> AvroStorage from piggybank in the data integrity analysis, and committed the >> compiled (0.11 trunk) piggybank jar so the load analysis scripts would be >> relatively easy to use. It seems to be compatible with Pig 0.8.1. I am a >> little wary of having to maintain that type of thing at the Apache org level >> so for now I have checked all the code in on Github under an ASL 2.0 license: >> >> https://github.com/mpercy/flume-load-gen >> >> I have created a Wiki page with the performance metrics we have come up with >> so far. The executive summary is that at the time of this writing, we have >> observed Flume NG on a single machine processing events at a throughput rate >> of 70,000+ events/sec with no data loss. >> >> https://cwiki.apache.org/confluence/display/FLUME/Flume+NG+Performance+Measurements >> >> I have put more details on the wiki page itself. Please let me know if you >> want me to add more detail. I'll be looking into improving the performance >> of these components going forward, however we wanted to post these results >> to set a public performance baseline of Flume NG. >> >> If others have done performance testing, we would love to see your results >> if you can post the details. >> >> Regards, >> Mike >>
