[ https://issues.apache.org/jira/browse/FLUME-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14956446#comment-14956446 ]
Jun Seok Hong commented on FLUME-2777: -------------------------------------- In linux, getting the created time for a file is impossible. Files.getAttribute(Paths.get("xx"), "basic:creationTime") returns last-modified-time. [https://docs.oracle.com/javase/7/docs/api/java/nio/file/attribute/BasicFileAttributes.html] If the target file is modified after taildir starts, duplicate events will be happened. > Tail Dir Source leads to duplicate events on rolling the tailed file > -------------------------------------------------------------------- > > Key: FLUME-2777 > URL: https://issues.apache.org/jira/browse/FLUME-2777 > Project: Flume > Issue Type: Bug > Components: Sinks+Sources > Affects Versions: notrack > Reporter: Johny Rufus > Assignee: Johny Rufus > Attachments: FLUME-2777-1.patch, FLUME-2777.patch > > > I have a simple setup, where I write 200 events to logfile1. [TailSrc is on > the lookout for logfile* ] > Then I rename logfile1 to logfile2. > I create a new logfile1 and write 100 events to it. > Typically I should see 300 events in my channel. But I see 500 events. > I was able to trace the duplicates to ReliableTaildirEventReader.java > updateFiles(boolean) to the way renamed files are handled , by specifying > starting position as 0. [This starting position should be obtained from > tf.getPosition()] > I am attaching a proposed fix, would be great if one of you guys > [~iijima_satoshi] / [~hshreedharan]/ [~roshan_naik] can take a look at the > fix and validate the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)