[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14663119#comment-14663119
 ] 

Roshan Naik commented on FLUME-2498:
------------------------------------

- Consuming in sorted order : This does seem to be important in order to avoid 
data loss and exceedingly long delays in data delivery. Basically if it doesn't 
consume soon enough there is a greater danger that the other process which is 
deleting away the log files is likely to do so without the tail dir actually 
consuming the file. Also end users can easily get concerned if they see newer 
data first and not the older data.. leading to suspicion of data loss. This 
seemed like a small change. For now the need for multiple types of ordering is 
not needed (as might have been previously discussed). Just suffices to change 
the default scheme.

- Deserializer Support :  I am ok with not supporting deserializer for now.  i 
feel it might be possible to add in later and still remain backward compatible 
as the default deserializer is LINE. I proposed this as it seemed a small 
change  while also automatically addressing the max event size and newline 
issues.

- The  new line issue - that is definitely worth fixing to avoid bad data.

- Max event size - good to have but may not be a blocker.

- Updating position.json on every commit can lead to excessive duplication on 
failure scenarios. Not clear how if it have a significant perf impact... But it 
is not a blocker IMO.

- Rest i guess a are trivial doc changes.

I have been sensitive to ensure that my review 'suggestions' were relatively 
small in terms of changes needed or time required. I dont think it was 
unreasonable to check what the author felt about being able to address them in 
this round and proceed accordingly ... if not, potentially others or myself 
might chip in and take care of it.

Anyway it was important to make those observations as part of the review. And 
to fast track things.. I need clarity from Satoshi on which of them he can 
address quickly so that I can proceed accordingly. So please let me know.

> Implement Taildir Source
> ------------------------
>
>                 Key: FLUME-2498
>                 URL: https://issues.apache.org/jira/browse/FLUME-2498
>             Project: Flume
>          Issue Type: New Feature
>          Components: Sinks+Sources
>            Reporter: Satoshi Iijima
>             Fix For: v1.7.0
>
>         Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498.patch
>
>
> This is the proposal of implementing a new tailing source.
> This source watches the specified files, and tails them in nearly real-time 
> once appends are detected to these files.
> * This source is reliable and will not miss data even when the tailing files 
> rotate.
> * It periodically writes the last read position of each file in a position 
> file using the JSON format.
> * If Flume is stopped or down for some reason, it can restart tailing from 
> the position written on the existing position file.
> * It can add event headers to each tailing file group. 
> A attached patch includes a config documentation of this.
> This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to