[ 
https://issues.apache.org/jira/browse/FLUME-2222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13882635#comment-13882635
 ] 

Ashish Paliwal commented on FLUME-2222:
---------------------------------------

This is the expected behavior. In case of failure the transaction shall be 
rolled back and the batch shall be sent again, which means some entries might 
have got indexed by the time failure happened. So your analysis is correct.

> Duplicate entries in Elasticsearch when using Flume elasticsearch-sink
> ----------------------------------------------------------------------
>
>                 Key: FLUME-2222
>                 URL: https://issues.apache.org/jira/browse/FLUME-2222
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v1.4.0
>         Environment: centos 6
>            Reporter: Nikolaos Tsipas
>            Assignee: Ashish Paliwal
>              Labels: elasticsearch, sink
>         Attachments: Screen Shot 2013-10-29 at 12.36.01.png
>
>
> Hello,
> I'm using flume elasticsearch-sink to transfer logs from ec2 instances to 
> elasticsearch and I get duplicate entries for numerous documents. 
> I've noticed this issue when I was sending a specific number of log lines to 
> elasticsearch using flume and then I was counting them using kibana to verify 
> that all of them arrived. Most of the time, especially when multiple flume 
> instances were used, I was getting duplicate entries. e.g. instead of 
> receiving 10000 documents from an instance, I was receiving 10060. 
> Duplication level seems to be proportional to the number of instances sending 
> log data simultaneously. e.g. with 3 flume instances I get 10060, with 50 
> flume instances I get 10300.
> Is duplication something that I should expect when using flume 
> elasticsearch-sink?
> There is a {{doRollback()}} method that is called on transaction failure but 
> I think that it updates only the local flume channel and not elasticsearch.
> Any info/suggestions would be appreciated.
> Regards,
> Nick



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to