[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14655182#comment-14655182
 ] 

Roshan Naik commented on FLUME-2498:
------------------------------------

*General Comments*
 # This patch seems relatively mature in its implementation. After making the 
above fixes, I gave it some testing on my mac and tried to cover some potential 
corner cases and it handled them pretty well.
 # Like the filegroup feature.
 # Like the fact that it can track many files at once.
 # Handles the case when the event/line is still not completely written
 # Seems like it is able to pick up appends to files that have been previously 
closed due to timeout. Thats very nice!
 # Is tolerant to deletion of file and recreation of new file with same name. 
(treats them as diff files). Again very nice!
 # Ran code coverage on the unit tests. Coverage is pretty good (80% line 
coverage).

*Questions:*
 # Was not able to verify if it handles subdirectories also ? can you confirm 
whether or not it handles it ?
 # Wasn't clear how often it commits to the position.json file ? Intuitively i 
would say for every batch committed into the channel the json file should get 
updated.
 # can a regex be applied to the directory also and not just file name ?
 # Windows : What areas in this implementation do you feel may break on Windows 
? 
 # Is there some limit on how many files it will track ?


*Suggestions*
# major - need to document that it will not delete or rename files, and that 
there is an expectation of this should be done externally (unlike spooldir)
# major - it definitely needs deserializer support. readevent()  can forward it 
to configured deserializer.
# major - Does not have a max event size setting (i.e. line length for text 
files). good to default to a large number (8k ?) for. Deserializer support will 
automatically give this. 
# major - files to consume should be selected in order of creation time  by 
default.
# major - I think readline() has a bug. it is treating \r without a \n 
immediately following it as a new line.
   Patch in FLUME-2508 might be useful for this.
# minor - If the file is being overwritten (instead of append) it could log an 
error and exclude that file ?



> Implement Taildir Source
> ------------------------
>
>                 Key: FLUME-2498
>                 URL: https://issues.apache.org/jira/browse/FLUME-2498
>             Project: Flume
>          Issue Type: New Feature
>          Components: Sinks+Sources
>            Reporter: Satoshi Iijima
>             Fix For: v1.7.0
>
>         Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498.patch
>
>
> This is the proposal of implementing a new tailing source.
> This source watches the specified files, and tails them in nearly real-time 
> once appends are detected to these files.
> * This source is reliable and will not miss data even when the tailing files 
> rotate.
> * It periodically writes the last read position of each file in a position 
> file using the JSON format.
> * If Flume is stopped or down for some reason, it can restart tailing from 
> the position written on the existing position file.
> * It can add event headers to each tailing file group. 
> A attached patch includes a config documentation of this.
> This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to