Hi Flume developers, Our environments multi-agents flume, in central linux flume(1.5.0 version), hdfs sink(haddop 2.5) , file channel and avro source In client windows flume(1.5.0), spool directory source, file channel & avro sink( there are several windows platform to the same central flume) The client flume will monitor spool directory then send files to central flume if any, so we will keep client flume running unless the windows is restarted. Sometimes customer has to restart the window for software maintenance,so they has one question, how thy know it is safe moment to start windows(so flume also restarted). I have one test,500 files in spool directory, central flume is running, then started client flume, after several minutes , shutdown the client flume, then start it again, in client log, we get "Pending takes 40 exist after the end of replay. Duplicate messages will exist in destination", and in hadoop we get 539 files finally( theare are sevaval bz2 files, we unzip it and find 539 source files included), totally 39 files duplicated. Thanks,Gary Xu
