Nikolaos Tsipas created FLUME-2222:
--------------------------------------

             Summary: Duplicate entries in Elasticsearch when using Flume 
elasticsearch-sink
                 Key: FLUME-2222
                 URL: https://issues.apache.org/jira/browse/FLUME-2222
             Project: Flume
          Issue Type: Bug
          Components: Sinks+Sources
    Affects Versions: v1.4.0
         Environment: centos 6
            Reporter: Nikolaos Tsipas


Hello,

I'm using flume elasticsearch-sink to transfer logs from ec2 instances to 
elasticsearch and I get duplicate entries for numerous documents. 

I've noticed this issue when I was sending a specific number of log lines to 
elasticsearch using flume and then I was counting them using kibana to verify 
that all of them arrived. Most of the time, especially when multiple flume 
instances were used, I was getting duplicate entries. e.g. instead of receiving 
10000 documents from an instance, I was receiving 10060. 

Duplication level seems to be proportional to the number of instances sending 
log data simultaneously. e.g. with 3 flume instances I get 10060, with 50 flume 
instances I get 10300.

Is duplication something that I should expect when using flume 
elasticsearch-sink?
There is a {{doRollback()}} method that is called on transaction failure but I 
think that it updates only the local flume channel and not elasticsearch.

Any info/suggestions would be appreciated.

Regards,
Nick



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to