Arvind, Please do let me know once you have granted me permission to the wiki. -roshan
From: Hari Shreedharan <[email protected]<mailto:[email protected]>> Date: Thursday, April 2, 2015 3:06 PM To: Roshan Naik <[email protected]<mailto:[email protected]>> Cc: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: Re: Flume performance measurements Arvind - please could you grant Roshan access to the wiki. Thanks, Hari On Thu, Apr 2, 2015 at 3:04 PM, Roshan Naik <[email protected]<mailto:[email protected]>> wrote: Could u grant me write access to wiki ? username: roshannaik On 4/2/15 2:53 PM, "Hari Shreedharan" <[email protected]<mailto:[email protected]>> wrote: >Roshan, > > > > >Could you update the performance measurements page on our wiki with this >info? That would be more useful to reference. > > > > >Thanks, Hari > >On Thu, Apr 2, 2015 at 2:34 PM, Roshan Naik ><[email protected]<mailto:[email protected]>> >wrote: > >> Sample Flume v1.4 Measurements for reference: >> Here are some sample measurements taken with a single agent and 500 >>byte events. >> Cluster Config: 20-node Hadoop cluster (1 name node and 19 data nodes). >> Machine Config: 24 cores - Xeon E5-2640 v2 @ 2.00GHz, 164 GB RAM. >> 1. File channel with HDFS Sink (Sequence File): >> Source: 4 x Exec Source, 100k batchSize >> HDFS Sink Batch size: 500,000 >> Channel: File >> Number of data dirs: 8 >> Events/Sec >> Sink Count >> 1 data dirs >> 2 data dirs >> 4 data dirs >> 6 data dirs >> 8 data dirs >> 10 data dirs >> 1 >> 14.3 k >> 2 >> 21.9 k >> 4 >> 35.8 k >> 8 >> 24.8 k >> 43.8 k >> 72.5 k >> 77 k >> 78.6 k >> 76.6 k >> 10 >> 58 k >> 12 >> 49.3 k >> 49 k >> Was looking for sweet spot in perf. So did not take measurements for >>all data points on grid. Only too for the ones that made sense. For >>example: when perf dropped by adding more sinks, did not take more >>measurements for those rows. >> 2. HDFS Sink: >> Channel: Memory >> # of HDFS >> Sinks >> Snappy >> BatchSz:1.2mill >> Snappy >> BatchSz:1.4mill >> Sequence File >> BatchSz:1.2mill >> 1 >> 34.3 k >> 33 k >> 33 k >> 2 >> 71 k >> 75 k >> 69 k >> 4 >> 141 k >> 145 k >> 141 k >> 8 >> 271 k >> 273 k >> 251 k >> 12 >> 382 k >> 380 k >> 370 k >> 16 >> 478 k >> 538 k >> 486 k >> Some simple observations : >> * increasing number of dataDirs helps FC perf even on single disk >>systems >> * Increasing number of sinks helps >> * Max throughput observed was about 538k events/sec for HDFS sink >>which is approx 240MB/s
