[
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14663119#comment-14663119
]
Roshan Naik commented on FLUME-2498:
------------------------------------
- Consuming in sorted order : This does seem to be important in order to avoid
data loss and exceedingly long delays in data delivery. Basically if it doesn't
consume soon enough there is a greater danger that the other process which is
deleting away the log files is likely to do so without the tail dir actually
consuming the file. Also end users can easily get concerned if they see newer
data first and not the older data.. leading to suspicion of data loss. This
seemed like a small change. For now the need for multiple types of ordering is
not needed (as might have been previously discussed). Just suffices to change
the default scheme.
- Deserializer Support : I am ok with not supporting deserializer for now. i
feel it might be possible to add in later and still remain backward compatible
as the default deserializer is LINE. I proposed this as it seemed a small
change while also automatically addressing the max event size and newline
issues.
- The new line issue - that is definitely worth fixing to avoid bad data.
- Max event size - good to have but may not be a blocker.
- Updating position.json on every commit can lead to excessive duplication on
failure scenarios. Not clear how if it have a significant perf impact... But it
is not a blocker IMO.
- Rest i guess a are trivial doc changes.
I have been sensitive to ensure that my review 'suggestions' were relatively
small in terms of changes needed or time required. I dont think it was
unreasonable to check what the author felt about being able to address them in
this round and proceed accordingly ... if not, potentially others or myself
might chip in and take care of it.
Anyway it was important to make those observations as part of the review. And
to fast track things.. I need clarity from Satoshi on which of them he can
address quickly so that I can proceed accordingly. So please let me know.
> Implement Taildir Source
> ------------------------
>
> Key: FLUME-2498
> URL: https://issues.apache.org/jira/browse/FLUME-2498
> Project: Flume
> Issue Type: New Feature
> Components: Sinks+Sources
> Reporter: Satoshi Iijima
> Fix For: v1.7.0
>
> Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498.patch
>
>
> This is the proposal of implementing a new tailing source.
> This source watches the specified files, and tails them in nearly real-time
> once appends are detected to these files.
> * This source is reliable and will not miss data even when the tailing files
> rotate.
> * It periodically writes the last read position of each file in a position
> file using the JSON format.
> * If Flume is stopped or down for some reason, it can restart tailing from
> the position written on the existing position file.
> * It can add event headers to each tailing file group.
> A attached patch includes a config documentation of this.
> This source requires Unix-style file system and Java 1.7 or later.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)