[
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14174871#comment-14174871
]
Satoshi Iijima commented on FLUME-2498:
---------------------------------------
Appearing below is answers to the questions posted to the mailing lists.
> Btw., because the position in the file is checkpointed periodically, does
> that mean that it is possible that, after a restart, some number of lines
> that have already been tailed, will be read again?
Yes. They will not be read again.
On restart this source will start reading from the last read position in
position file.
> - How does it know when to stop tailing the current file and switch to or
> start tailing another file
> - When there is a backlog of many files being built up... how does it order
> the files for consumption
This source does not have the order because it is basically supposed to tail
appended lines of files in nearly real-time.
If there is a backlog of many files on start-up, one file will be selected in
random order and be read to EOF, then the next file will be selected in the
same way.
Using 'skipToEnd' property, it can also start tailing from EOF of the current
files.
> - Sounds like there is some C/C++ native code + JNI to work with inodes ?
> what api are you using.
This source uses java.nio.file.Files.getAttribute() of Java 7 API to identify
inode of a file.
> - does it auto delete the consumed files ?
No, the consumed files need not be deleted in this source. Files and positions
of each file that should be tailed are recorded in the position file.
For example, a log file of a application such as /var/log/app/access.log can be
directly specified in flume.conf
> Implement Taildir Source
> ------------------------
>
> Key: FLUME-2498
> URL: https://issues.apache.org/jira/browse/FLUME-2498
> Project: Flume
> Issue Type: New Feature
> Components: Sinks+Sources
> Reporter: Satoshi Iijima
> Attachments: FLUME-2498.patch
>
>
> This is the proposal of implementing a new tailing source.
> This source watches the specified files, and tails them in nearly real-time
> once appends are detected to these files.
> * This source is reliable and will not miss data even when the tailing files
> rotate.
> * It periodically writes the last read position of each file in a position
> file using the JSON format.
> * If Flume is stopped or down for some reason, it can restart tailing from
> the position written on the existing position file.
> * It can add event headers to each tailing file group.
> A attached patch includes a config documentation of this.
> This source requires Unix-style file system and Java 1.7 or later.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)