[
https://issues.apache.org/jira/browse/FLUME-2222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13900521#comment-13900521
]
Nikolaos Tsipas commented on FLUME-2222:
----------------------------------------
Thanks for your message. I guess you can resolve this ticket.
> Duplicate entries in Elasticsearch when using Flume elasticsearch-sink
> ----------------------------------------------------------------------
>
> Key: FLUME-2222
> URL: https://issues.apache.org/jira/browse/FLUME-2222
> Project: Flume
> Issue Type: Bug
> Components: Sinks+Sources
> Affects Versions: v1.4.0
> Environment: centos 6
> Reporter: Nikolaos Tsipas
> Assignee: Ashish Paliwal
> Labels: elasticsearch, sink
> Attachments: Screen Shot 2013-10-29 at 12.36.01.png
>
>
> Hello,
> I'm using flume elasticsearch-sink to transfer logs from ec2 instances to
> elasticsearch and I get duplicate entries for numerous documents.
> I've noticed this issue when I was sending a specific number of log lines to
> elasticsearch using flume and then I was counting them using kibana to verify
> that all of them arrived. Most of the time, especially when multiple flume
> instances were used, I was getting duplicate entries. e.g. instead of
> receiving 10000 documents from an instance, I was receiving 10060.
> Duplication level seems to be proportional to the number of instances sending
> log data simultaneously. e.g. with 3 flume instances I get 10060, with 50
> flume instances I get 10300.
> Is duplication something that I should expect when using flume
> elasticsearch-sink?
> There is a {{doRollback()}} method that is called on transaction failure but
> I think that it updates only the local flume channel and not elasticsearch.
> Any info/suggestions would be appreciated.
> Regards,
> Nick
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)