Much of that can be attributed to the fact that the File Channel does actually write to only one file per data directory and all writes are blocking. There are probably a lot more improvements to make, especially when it comes to how we handle transactions (fsync etc). I’d try working with bigger batch sizes (like 10K) and also use multi-threaded sources with several disks - if you put in more disks perf is going to improve for sure. If you can take a look, it would nice to see the hotspots within the file channel - using a profiler etc.
Thanks, Hari On Friday, December 13, 2013 at 6:21 PM, Roshan Naik wrote: > Some of the folks on this dev list may be aware that I am doing some flume > performance measurements. > > Here is some preliminary data: > > > I initially started with Avro source + FC + 4 HDFS sinks. Measurements > indicated the agent was able to only reach around 20k events per second. I > tried with event sizes of 1kB and 500 bytes. > > > I replaced the hdfs sinks with null sinks just to narrow down the source of > the bottle neck. For the same reason i replaced the source with an exec > source that which basically in a loop will cat the same 1GB input file many > times. > > *SYSTEM STATS:* > There is a single Disk on the machine but the utilization is very low as > can be seen from the *iostat* output below: > > avg-cpu: %user %nice %system %iowait %steal %idle > 2.37 0.00 0.44 0.04 0.00 97.16 > > Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn > sda 95.98 655.31 6603.58 1348373762 13587517606 > > > Top output also shows cpu & memory are not bottleneck: > > top - 17:21:57 up 23 days, 19:34, 2 users, load average: 3.44, 3.17, 2.72 > Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie > Cpu(s): *5.9%us,* 3.3%sy, 0.0%ni, 90.7%id, 0.2%wa, 0.0%hi, 0.0%si, > 0.0%st > Mem: 65937984k total, 22648200k used, 43289784k free, 198448k buffers > Swap: 1048568k total, 14268k used, 1034300k free, 19619416k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > > 6255 root 20 0 12.3g 1.4g 125m S *219.4* 2.2 19:57.64 java > > > > *FLUME MEASUREMENTS* > > Since there was spare CPU & Mem & Disk available, I ran a 2nd agent and > noticed that it was able to independently deliver approx. 20k events /sec. > With third agent also same perf was observed. > So system does not seem to be bottleneck. > > The channel size remains small and steady so the ingestion rate is the > bottleneck not the drain rate. > > Varying the batch size on exec source between 20,100, 500 & 1000 yielded > the foll numbers for ingestion rate with event size of 1024bytes: > > FC + exec (batch size 20) + 4 null sink = 18k events/sec > FC + exec (batch size 100) + 4 null sink = 24.2k eps > FC + exec (batch size 500) + 4 null sink = 24k eps > FC + exec (batch size 1000) + 4 null sink = 23.2k eps > > Just for the heck of it, i replaced FC with MemCh > > FC + exec (batch size 1000) + 4 null sink = 123.4k eps > > > A few runs with Event size of 500 bytes also gave me numbers in the same > ballpark. > > Here is my FC config: > > nontx_agent01.channels.fc.checkpointDir = /flume/checkpoint/agent1 > nontx_agent01.channels.fc.dataDirs = /flume/data/agent1 > nontx_agent01.channels.fc.capacity = 140000000 > nontx_agent01.channels.fc.transactionCapacity = 240000 > > > In this setup, these numbers appear to be indicating that Events/s seems to > be a primary bottleneck in FC, and not much the event size or batch size or > cpu/disk capacity. > > > -Roshan > > -- > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity to > which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You. > >
