[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14659613#comment-14659613
 ] 

Satoshi Iijima commented on FLUME-2498:
---------------------------------------


Thank you for reviewing and updating a patch, Roshan.

bq. 1. Was not able to verify if it handles subdirectories also ? can you 
confirm whether or not it handles it ?

Now it cannot handles subdirectories. But it would be better to be able to 
track files of subdirectories.

bq. 2. Wasn't clear how often it commits to the position.json file ? 
Intuitively i would say for every batch committed into the channel the json 
file should get updated.

If position.json is updated for every batch committed, it impacts the 
performance in a small way.
On the other hand, if only position.json is updated in regular interval, data 
loss do not occur when flume restarts for some reason.

bq. 3. can a regex be applied to the directory also and not just file name ?

Now this source cannot apply it. But this feature sounds good.
It would be good to implement these feature (of question 1 and 3) after this 
patch is merged to trunk.

bq. 4. Windows : What areas in this implementation do you feel may break on 
Windows ?

This source use inode to identify uniqueness of file. It would need to use file 
ID instead of inode on winodws.

bq. 5. Is there some limit on how many files it will track ?
Although I do not confirm the limit on a test, there are many hosts where this 
source tracks several hundreds of files in my production emvioronment.


> Implement Taildir Source
> ------------------------
>
>                 Key: FLUME-2498
>                 URL: https://issues.apache.org/jira/browse/FLUME-2498
>             Project: Flume
>          Issue Type: New Feature
>          Components: Sinks+Sources
>            Reporter: Satoshi Iijima
>             Fix For: v1.7.0
>
>         Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498.patch
>
>
> This is the proposal of implementing a new tailing source.
> This source watches the specified files, and tails them in nearly real-time 
> once appends are detected to these files.
> * This source is reliable and will not miss data even when the tailing files 
> rotate.
> * It periodically writes the last read position of each file in a position 
> file using the JSON format.
> * If Flume is stopped or down for some reason, it can restart tailing from 
> the position written on the existing position file.
> * It can add event headers to each tailing file group. 
> A attached patch includes a config documentation of this.
> This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to