Hi,

On Mon, Mar 10, 2014 at 3:35 AM, gortiz <[email protected]> wrote:

> Hi,
>
> About the tailing, I was checking the code of the tail from Linux and
> there's some chance to lost data when the file rotates.
>

In case of Linux's own tail?


> Plus, if Flume is stopped, there's not chance to recover the data when it
> isn't getting the data. I have implemented and checkpoint mechanism to
> recover the most data as possible is this happen.
>

Right.  Flume will miss any data that was logged while it was down because
Flume simply uses tail -F with ExecSource.

Your implementation remembers the last file (inode?) it tailed + position
in that file?

What happens when multiple log files are rotated while Flume agent was
down?  Does your implementation know how to:
1) read the last tailed file from where it stopped all the way to the end
2) read all files that were completely missed from beginning to the end
3) start tailing the "active" log file

Assuming yes, yes, and yes, can one configure:
A) if 3) should start happening right away (while 1) and 2) are happening
"in the background)
B) or whether 1), 2), and 3) should happen sequentially

The A) use case is very handy when the most recent data is much more
valuable than old data (e.g. performance metrics) and thus you'd rather
start sending new data first and backfill old data later (or in parallel).

Have you compared your approach+impl with
http://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/Tailer.html?

http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java

Thanks,
Otis



>
> I think that Tailing for Flume is good enough it you're not worry to lose
> any data, but it I needed to improve a little bit this feature.
>
> If you have more question, let me know.
>
> Guillermo Ortiz.
>
> On 07/03/14 21:47, Otis Gospodnetic wrote:
>
>> Hi Guillermo,
>>
>> I don't have the need for FLUME-2321, but maybe one of the devs can have a
>> look.
>>
>> I am curious about that new tail source you mentioned, though.  Can you
>> tell us more about what you are working on, how it is going to work, and
>> how it will be better than the tailer form Apache Commons and ExecSource
>> with tail -F ?
>>
>> Thanks,
>> Otis
>> --
>> Performance Monitoring * Log Analytics * Search Analytics
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>>
>> On Sun, Mar 2, 2014 at 4:01 PM, Guillermo Ortiz <[email protected]
>> >wrote:
>>
>>  Hi,
>>>
>>> I did a new feature for Flume
>>> (FLUME-2321<https://issues.apache.org/jira/browse/FLUME-2321>),
>>> I'd like to know what people think about it and how it's the mechanism to
>>> be accepted a new feature
>>> It's first time that I collaborate with an Apache Project and I don't
>>> really know how it works. Or maybe it's because nobody is interested on
>>> it,
>>> hehe.
>>>
>>> On another hand, I'm coding a new "tail" source, and I don't want to get
>>> the same mistakes in the future.
>>>
>>> Thank you,
>>>
>>> Guillermo Ortiz.
>>>
>>>
>
> --
> *Guillermo Ortiz*
> /Big Data Developer/
>
> Telf.: +34 917 680 490
> Fax: +34 913 833 301
> C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain
>
> _http://www.bidoop.es_
>
>

Reply via email to