Sandeep Khurana created FLUME-2341:
--------------------------------------

             Summary: Flume duplicate events after restarting flume server
                 Key: FLUME-2341
                 URL: https://issues.apache.org/jira/browse/FLUME-2341
             Project: Flume
          Issue Type: Question
         Environment: centos on ec2 instance
            Reporter: Sandeep Khurana


We have flume ingestion servers in production environment which are getting 
data from scribe source. These servers are behind a load balancer. We observed 
that we get lots of duplicates (7-8 times of original events) when we 
 a) take out a flume server from load balancer
 b) wait for channel capacity to be zero i.e. wait for all data to be flushed 
out.
 c) change some configuration in flume (e.g. 1 time we changed the batch size)
 d) put the server back into load balancer.

As soon as the flume server is put back into load balancer we see sudden surge 
of data being processed. These are duplicate records (events). Question is

a) Why do we see 7-9 times of duplicate events when we add this server back 
into load balancer.
b) What is the best way to handle such type of changes in flume production 
boxes so that we dont see these many duplicated. 

Few hundred or couple of thousands duplicates we can live with. But if instead 
of getting 1,50,000 events we get 9,00,000 events (mostly duplicates) then our 
workflows will start having problems. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to