Nikolaos Tsipas created FLUME-2222:
--------------------------------------
Summary: Duplicate entries in Elasticsearch when using Flume
elasticsearch-sink
Key: FLUME-2222
URL: https://issues.apache.org/jira/browse/FLUME-2222
Project: Flume
Issue Type: Bug
Components: Sinks+Sources
Affects Versions: v1.4.0
Environment: centos 6
Reporter: Nikolaos Tsipas
Hello,
I'm using flume elasticsearch-sink to transfer logs from ec2 instances to
elasticsearch and I get duplicate entries for numerous documents.
I've noticed this issue when I was sending a specific number of log lines to
elasticsearch using flume and then I was counting them using kibana to verify
that all of them arrived. Most of the time, especially when multiple flume
instances were used, I was getting duplicate entries. e.g. instead of receiving
10000 documents from an instance, I was receiving 10060.
Duplication level seems to be proportional to the number of instances sending
log data simultaneously. e.g. with 3 flume instances I get 10060, with 50 flume
instances I get 10300.
Is duplication something that I should expect when using flume
elasticsearch-sink?
There is a {{doRollback()}} method that is called on transaction failure but I
think that it updates only the local flume channel and not elasticsearch.
Any info/suggestions would be appreciated.
Regards,
Nick
--
This message was sent by Atlassian JIRA
(v6.1#6144)