[ 
https://issues.apache.org/jira/browse/FLUME-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13919568#comment-13919568
 ] 

Ashish Paliwal commented on FLUME-2341:
---------------------------------------

This seems to be a User ML question. Can you please ask this on ML 
(http://flume.apache.org/mailinglists.html)

> Flume duplicate events after restarting flume server
> ----------------------------------------------------
>
>                 Key: FLUME-2341
>                 URL: https://issues.apache.org/jira/browse/FLUME-2341
>             Project: Flume
>          Issue Type: Question
>         Environment: centos on ec2 instance
>            Reporter: Sandeep Khurana
>
> We have flume ingestion servers in production environment which are getting 
> data from scribe source. These servers are behind a load balancer. We 
> observed that we get lots of duplicates (7-8 times of original events) when 
> we 
>  a) take out a flume server from load balancer
>  b) wait for channel capacity to be zero i.e. wait for all data to be flushed 
> out.
>  c) change some configuration in flume (e.g. 1 time we changed the batch size)
>  d) put the server back into load balancer.
> As soon as the flume server is put back into load balancer we see sudden 
> surge of data being processed. These are duplicate records (events). Question 
> is
> a) Why do we see 7-9 times of duplicate events when we add this server back 
> into load balancer.
> b) What is the best way to handle such type of changes in flume production 
> boxes so that we dont see these many duplicated. 
> Few hundred or couple of thousands duplicates we can live with. But if 
> instead of getting 1,50,000 events we get 9,00,000 events (mostly duplicates) 
> then our workflows will start having problems. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to