[ https://issues.apache.org/jira/browse/FLUME-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720927#comment-14720927 ]
Satoshi Iijima commented on FLUME-2777: --------------------------------------- I recommend to adjust a file path regex not to include the renamed file path. When some of files are truncated (deleted or archived to compressed file) and a new file is generated, the new file can occasionally have the same inode as the truncated file. It need be read as a new file from pos 0. I think that it is difficult to completely distinguish a new file which have the same inode as a tailing file from a renamed file. > Tail Dir Source leads to duplicate events on rolling the tailed file > -------------------------------------------------------------------- > > Key: FLUME-2777 > URL: https://issues.apache.org/jira/browse/FLUME-2777 > Project: Flume > Issue Type: Bug > Components: Sinks+Sources > Affects Versions: 1.7 > Reporter: Johny Rufus > Assignee: Johny Rufus > Attachments: FLUME-2777.patch > > > I have a simple setup, where I write 200 events to logfile1. [TailSrc is on > the lookout for logfile* ] > Then I rename logfile1 to logfile2. > I create a new logfile1 and write 100 events to it. > Typically I should see 300 events in my channel. But I see 500 events. > I was able to trace the duplicates to ReliableTaildirEventReader.java > updateFiles(boolean) to the way renamed files are handled , by specifying > starting position as 0. [This starting position should be obtained from > tf.getPosition()] > I am attaching a proposed fix, would be great if one of you guys > [~iijima_satoshi] / [~hshreedharan]/ [~roshan_naik] can take a look at the > fix and validate the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)