HDFS sink throughput

Nikolaos Tsipas Wed, 05 Mar 2014 10:13:34 -0800

Hello,

We are using flume's HDFS sink to store log data in Amazon S3 and we are facing 
some throughput issues. In our flume config we have an avro source, a file 
channel and the hdfs sink. The file channel is configured on a provisioned IOPS 
EBS volume and we are running on an m1.large EC2 instance (flume 1.4.0, java 
1.7.0).


Below you will find an example metric from our s3-file-channel. The main issue 
is that the "EventTakeSuccessCount" can't cope with the "EventPutSuccessCount" 
and as a result our "ChannelSize" increases over time.

We tried to use multiple hdfs-sinks but it didn't have any positive effect. 
Strangely, the problem is still there even when a memory channel is used. 
Another interesting fact is that we are also using an identical file-channel 
with the elasticsearch-sink and under the same load we don't have any 
throughput issues.

We would appreciate any suggestions that could help us improve the performance 
of the hdfs sink.

Regards,
Nick

 "CHANNEL.s3-file-channel": {
        "ChannelCapacity": "15000000",
        "ChannelFillPercentage": "11.6603",
        "ChannelSize": "1749045",
        "EventPutAttemptCount": "938299",
        "EventPutSuccessCount": "938181",
        "EventTakeAttemptCount": "648801",
        "EventTakeSuccessCount": "635000",
        "StartTime": "1394038826288",
        "StopTime": "0",
        "Type": "CHANNEL"




----------------------------

http://www.bbc.co.uk
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

---------------------

HDFS sink throughput

Reply via email to