[ 
https://issues.apache.org/jira/browse/FLUME-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14958348#comment-14958348
 ] 

Jun Seok Hong commented on FLUME-2777:
--------------------------------------

[~jrufus] how about using the filesize and hashcode() for some part of the file 
to check inode reusing?

public class TailFile {
 ...
  private long lastSize;
  private int hashValue;
  private long hashSize;
 ..
} 

When TailFile object is created,  the size of the file is written to lastSize.
We read one line of the file and write hashcode() for the string.
The length of the one line is wrtten to hashSize.
These values must be written to json file.

In updateTailFiles()-ReliableTaildirEventReader.java, instead of comparing 
path, validating inode using below routine.
 1. If the file size is small then the lastSize in TailFail, we guess the inode 
is reused.
 2. if the file size is greater than or equal to the lastSize, read hashSize 
bytes from begining of the file and compare the hashcode() to check inode 
reusing. 
 3. If inode is not reused, we update the lastSize.

This validation happens only for each file, the performance will not be 
impacted.

> Tail Dir Source leads to duplicate events on rolling the tailed file
> --------------------------------------------------------------------
>
>                 Key: FLUME-2777
>                 URL: https://issues.apache.org/jira/browse/FLUME-2777
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: notrack
>            Reporter: Johny Rufus
>            Assignee: Johny Rufus
>         Attachments: FLUME-2777-1.patch, FLUME-2777.patch
>
>
> I have a simple setup, where I write 200 events to logfile1. [TailSrc is on 
> the lookout for logfile* ]
> Then I rename logfile1 to logfile2.
> I create a new logfile1 and write 100 events to it.
> Typically I should see 300 events in my channel. But I see 500 events.
> I was able to trace the duplicates to ReliableTaildirEventReader.java 
> updateFiles(boolean) to the way renamed files are handled , by specifying 
> starting position as 0. [This starting position should be obtained from 
> tf.getPosition()]
> I am attaching a proposed fix, would be great if one of you guys 
> [~iijima_satoshi] / [~hshreedharan]/ [~roshan_naik] can take a look at the 
> fix and validate the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to