[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184747#comment-14184747
 ] 

Juhani Connolly commented on FLUME-2498:
----------------------------------------

Since most of the implementation details should be the same as an internal tool 
I wrote a while back I should be able to answer a couple of the remaining 
queries

> I can't tell if it's Yes or No to the "will lines be read again" 

Duplicate reads are possible if the process is restarted and tailing has to be 
resumed. Checkpoints are periodically written. We didn't see it as an issue as 
flume can create duplicate lines in other places, with the objective being to 
prevent log loss, not duplication.

> I think the person was asking whether this Taildir Source implementation 
> deletes a file when it's done reading it or not. I think the answer is that 
> it does NOT delete the file and that file deletion is somebody else's 
> responsibility. Correct?

This is correct. We use flume and the source as an "invisible" entity. We have 
it running on many internal services who do not need to worry about its 
existence as it works behind the scenes. We never had a need for it to delete 
the files, and for something tailing in real time, I suspect such a thing would 
be awkward. When would you delete a file that's actively being appended to? 
Once you're "done" reading, it may still get more appends. We close the file 
handles if there are no appends for a while, just to avoid hogging the file 
handle and so that log rotations and such are not obstructed, again with the 
objective that flume/the tailer be as unobstructive as possible.

> Implement Taildir Source
> ------------------------
>
>                 Key: FLUME-2498
>                 URL: https://issues.apache.org/jira/browse/FLUME-2498
>             Project: Flume
>          Issue Type: New Feature
>          Components: Sinks+Sources
>            Reporter: Satoshi Iijima
>         Attachments: FLUME-2498.patch
>
>
> This is the proposal of implementing a new tailing source.
> This source watches the specified files, and tails them in nearly real-time 
> once appends are detected to these files.
> * This source is reliable and will not miss data even when the tailing files 
> rotate.
> * It periodically writes the last read position of each file in a position 
> file using the JSON format.
> * If Flume is stopped or down for some reason, it can restart tailing from 
> the position written on the existing position file.
> * It can add event headers to each tailing file group. 
> A attached patch includes a config documentation of this.
> This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to