[ 
https://issues.apache.org/jira/browse/NIFI-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977680#comment-14977680
 ] 

Andre commented on NIFI-994:
----------------------------

[~markap14]

Flume has since version 1.7 (snapshot) a [taildir 
source|https://issues.apache.org/jira/browse/FLUME-2498].

The way they currently keep track of the files is using a position JSON sidecar 
file with content describing the log, inode and position of the tail against a 
file:

{code}
[{"inode":13209775,"pos":13771668368,"file":"/mnt/logs/logfilename.log"}]
{code}

It is not fault proof as the process tends to fail to detect changes to a file 
that result in the exact same size, e.g.:

So supposing the tail last queried a file with the following state:
{code}
$ cat log.log
AAAA
{code}

Updating it with similar content 
{code}
$ echo BBBB > log.log 
{code}

Would not trigger a new tail.

A more robust alternative would be to use checksums as suggested by [~jskora] 
but instead of checksumming the processed content, one would checksum a fixed 
number of bytes preceding the saved seek position.

More or less like (apologies for my weird pseudo-code):
{code}
IF SEEK_POSITION AND FILESIZE >= 8 BYTES
   if = OPEN logfile
   SEEK lf AT SEEK_POSITION - 8 BYTES
   SHA256(READ 8 BYTES FROM if)
{code}

What do you think?

> Processor to tail files
> -----------------------
>
>                 Key: NIFI-994
>                 URL: https://issues.apache.org/jira/browse/NIFI-994
>             Project: Apache NiFi
>          Issue Type: New Feature
>    Affects Versions: 0.4.0
>            Reporter: Joseph Percivall
>            Assignee: Mark Payne
>             Fix For: 0.4.0
>
>         Attachments: 0001-NIFI-994-Initial-import-of-TailFile.patch
>
>
> It's a very common data ingest situation to want to input text into the 
> system by "tailing" a file, most commonly log files. Currently we don't have 
> an easy way to do this. 
> A simple processor to tail a file would benefit many users. There would need 
> to be an option to not just tail a file but pick up where the processor left 
> off if it is interrupted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to