Hi Flume developers,    Our environments multi-agents flume, in central linux 
flume(1.5.0 version), hdfs sink(haddop 2.5) , file channel and avro source   In 
client windows flume(1.5.0), spool directory source, file channel & avro sink( 
there are several windows platform to the same central flume)
   The client flume will monitor spool directory then send files to central 
flume if any, so we will keep client flume running unless the windows is 
restarted. Sometimes customer has to restart the window for software 
maintenance,so they has one question, how thy know it is safe moment to start 
windows(so flume also restarted).
 I have one test,500 files in spool directory, central flume is running, then 
started client flume, after several minutes , shutdown the client flume, then 
start it again, in client log, we get "Pending takes 40 exist
after the end of replay. Duplicate messages will exist in destination", and in 
hadoop we get 539 files finally( theare are sevaval bz2 files, we unzip it and 
find 539 source files included), totally 39 files duplicated.
Thanks,Gary Xu

                                          

Reply via email to