Re: Flume performance measurements

Roshan Naik Wed, 08 Apr 2015 15:27:03 -0700

Arvind,
  Please do let me know once  you have granted me permission to the wiki.
-roshan


From: Hari Shreedharan 
<[email protected]<mailto:[email protected]>>
Date: Thursday, April 2, 2015 3:06 PM
To: Roshan Naik <[email protected]<mailto:[email protected]>>
Cc: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: Flume performance measurements

Arvind - please could you grant Roshan access to the wiki.

Thanks,
Hari



On Thu, Apr 2, 2015 at 3:04 PM, Roshan Naik 
<[email protected]<mailto:[email protected]>> wrote:

Could u grant me write access to wiki ?
username: roshannaik



On 4/2/15 2:53 PM, "Hari Shreedharan" 
<[email protected]<mailto:[email protected]>> wrote:

>Roshan,
>
>
>
>
>Could you update the performance measurements page on our wiki with this
>info? That would be more useful to reference.
>
>
>
>
>Thanks, Hari
>
>On Thu, Apr 2, 2015 at 2:34 PM, Roshan Naik 
><[email protected]<mailto:[email protected]>>
>wrote:
>
>> Sample Flume v1.4 Measurements for reference:
>> Here are some sample measurements taken with a single agent and 500
>>byte events.
>> Cluster Config: 20-node Hadoop cluster (1 name node and 19 data nodes).
>> Machine Config: 24 cores - Xeon E5-2640 v2 @ 2.00GHz, 164 GB RAM.
>> 1. File channel with HDFS Sink (Sequence File):
>> Source: 4 x Exec Source, 100k batchSize
>> HDFS Sink Batch size: 500,000
>> Channel: File
>> Number of data dirs: 8
>> Events/Sec
>> Sink Count
>> 1 data dirs
>> 2 data dirs
>> 4 data dirs
>> 6 data dirs
>> 8 data dirs
>> 10 data dirs
>> 1
>> 14.3 k
>> 2
>> 21.9 k
>> 4
>> 35.8 k
>> 8
>> 24.8 k
>> 43.8 k
>> 72.5 k
>> 77 k
>> 78.6 k
>> 76.6 k
>> 10
>> 58 k
>> 12
>> 49.3 k
>> 49 k
>> Was looking for sweet spot in perf. So did not take measurements for
>>all data points on grid. Only too for the ones that made sense. For
>>example: when perf dropped by adding more sinks, did not take more
>>measurements for those rows.
>> 2. HDFS Sink:
>> Channel: Memory
>> # of HDFS
>> Sinks
>> Snappy
>> BatchSz:1.2mill
>> Snappy
>> BatchSz:1.4mill
>> Sequence File
>> BatchSz:1.2mill
>> 1
>> 34.3 k
>> 33 k
>> 33 k
>> 2
>> 71 k
>> 75 k
>> 69 k
>> 4
>> 141 k
>> 145 k
>> 141 k
>> 8
>> 271 k
>> 273 k
>> 251 k
>> 12
>> 382 k
>> 380 k
>> 370 k
>> 16
>> 478 k
>> 538 k
>> 486 k
>> Some simple observations :
>> * increasing number of dataDirs helps FC perf even on single disk
>>systems
>> * Increasing number of sinks helps
>> * Max throughput observed was about 538k events/sec for HDFS sink
>>which is approx 240MB/s

Re: Flume performance measurements

Reply via email to