[
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14655182#comment-14655182
]
Roshan Naik commented on FLUME-2498:
------------------------------------
*General Comments*
# This patch seems relatively mature in its implementation. After making the
above fixes, I gave it some testing on my mac and tried to cover some potential
corner cases and it handled them pretty well.
# Like the filegroup feature.
# Like the fact that it can track many files at once.
# Handles the case when the event/line is still not completely written
# Seems like it is able to pick up appends to files that have been previously
closed due to timeout. Thats very nice!
# Is tolerant to deletion of file and recreation of new file with same name.
(treats them as diff files). Again very nice!
# Ran code coverage on the unit tests. Coverage is pretty good (80% line
coverage).
*Questions:*
# Was not able to verify if it handles subdirectories also ? can you confirm
whether or not it handles it ?
# Wasn't clear how often it commits to the position.json file ? Intuitively i
would say for every batch committed into the channel the json file should get
updated.
# can a regex be applied to the directory also and not just file name ?
# Windows : What areas in this implementation do you feel may break on Windows
?
# Is there some limit on how many files it will track ?
*Suggestions*
# major - need to document that it will not delete or rename files, and that
there is an expectation of this should be done externally (unlike spooldir)
# major - it definitely needs deserializer support. readevent() can forward it
to configured deserializer.
# major - Does not have a max event size setting (i.e. line length for text
files). good to default to a large number (8k ?) for. Deserializer support will
automatically give this.
# major - files to consume should be selected in order of creation time by
default.
# major - I think readline() has a bug. it is treating \r without a \n
immediately following it as a new line.
Patch in FLUME-2508 might be useful for this.
# minor - If the file is being overwritten (instead of append) it could log an
error and exclude that file ?
> Implement Taildir Source
> ------------------------
>
> Key: FLUME-2498
> URL: https://issues.apache.org/jira/browse/FLUME-2498
> Project: Flume
> Issue Type: New Feature
> Components: Sinks+Sources
> Reporter: Satoshi Iijima
> Fix For: v1.7.0
>
> Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498.patch
>
>
> This is the proposal of implementing a new tailing source.
> This source watches the specified files, and tails them in nearly real-time
> once appends are detected to these files.
> * This source is reliable and will not miss data even when the tailing files
> rotate.
> * It periodically writes the last read position of each file in a position
> file using the JSON format.
> * If Flume is stopped or down for some reason, it can restart tailing from
> the position written on the existing position file.
> * It can add event headers to each tailing file group.
> A attached patch includes a config documentation of this.
> This source requires Unix-style file system and Java 1.7 or later.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)