Hello,
We are using flume's HDFS sink to store log data in Amazon S3 and we are facing
some throughput issues. In our flume config we have an avro source, a file
channel and the hdfs sink. The file channel is configured on a provisioned IOPS
EBS volume and we are running on an m1.large EC2 instance (flume 1.4.0, java
1.7.0).
Below you will find an example metric from our s3-file-channel. The main issue
is that the "EventTakeSuccessCount" can't cope with the "EventPutSuccessCount"
and as a result our "ChannelSize" increases over time.
We tried to use multiple hdfs-sinks but it didn't have any positive effect.
Strangely, the problem is still there even when a memory channel is used.
Another interesting fact is that we are also using an identical file-channel
with the elasticsearch-sink and under the same load we don't have any
throughput issues.
We would appreciate any suggestions that could help us improve the performance
of the hdfs sink.
Regards,
Nick
"CHANNEL.s3-file-channel": {
"ChannelCapacity": "15000000",
"ChannelFillPercentage": "11.6603",
"ChannelSize": "1749045",
"EventPutAttemptCount": "938299",
"EventPutSuccessCount": "938181",
"EventTakeAttemptCount": "648801",
"EventTakeSuccessCount": "635000",
"StartTime": "1394038826288",
"StopTime": "0",
"Type": "CHANNEL"
----------------------------
http://www.bbc.co.uk
This e-mail (and any attachments) is confidential and may contain personal
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.
---------------------