[
https://issues.apache.org/jira/browse/FLUME-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13919568#comment-13919568
]
Ashish Paliwal commented on FLUME-2341:
---------------------------------------
This seems to be a User ML question. Can you please ask this on ML
(http://flume.apache.org/mailinglists.html)
> Flume duplicate events after restarting flume server
> ----------------------------------------------------
>
> Key: FLUME-2341
> URL: https://issues.apache.org/jira/browse/FLUME-2341
> Project: Flume
> Issue Type: Question
> Environment: centos on ec2 instance
> Reporter: Sandeep Khurana
>
> We have flume ingestion servers in production environment which are getting
> data from scribe source. These servers are behind a load balancer. We
> observed that we get lots of duplicates (7-8 times of original events) when
> we
> a) take out a flume server from load balancer
> b) wait for channel capacity to be zero i.e. wait for all data to be flushed
> out.
> c) change some configuration in flume (e.g. 1 time we changed the batch size)
> d) put the server back into load balancer.
> As soon as the flume server is put back into load balancer we see sudden
> surge of data being processed. These are duplicate records (events). Question
> is
> a) Why do we see 7-9 times of duplicate events when we add this server back
> into load balancer.
> b) What is the best way to handle such type of changes in flume production
> boxes so that we dont see these many duplicated.
> Few hundred or couple of thousands duplicates we can live with. But if
> instead of getting 1,50,000 events we get 9,00,000 events (mostly duplicates)
> then our workflows will start having problems.
--
This message was sent by Atlassian JIRA
(v6.2#6252)