[
https://issues.apache.org/jira/browse/NIFI-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977680#comment-14977680
]
Andre commented on NIFI-994:
----------------------------
[~markap14]
Flume has since version 1.7 (snapshot) a [taildir
source|https://issues.apache.org/jira/browse/FLUME-2498].
The way they currently keep track of the files is using a position JSON sidecar
file with content describing the log, inode and position of the tail against a
file:
{code}
[{"inode":13209775,"pos":13771668368,"file":"/mnt/logs/logfilename.log"}]
{code}
It is not fault proof as the process tends to fail to detect changes to a file
that result in the exact same size, e.g.:
So supposing the tail last queried a file with the following state:
{code}
$ cat log.log
AAAA
{code}
Updating it with similar content
{code}
$ echo BBBB > log.log
{code}
Would not trigger a new tail.
A more robust alternative would be to use checksums as suggested by [~jskora]
but instead of checksumming the processed content, one would checksum a fixed
number of bytes preceding the saved seek position.
More or less like (apologies for my weird pseudo-code):
{code}
IF SEEK_POSITION AND FILESIZE >= 8 BYTES
if = OPEN logfile
SEEK lf AT SEEK_POSITION - 8 BYTES
SHA256(READ 8 BYTES FROM if)
{code}
What do you think?
> Processor to tail files
> -----------------------
>
> Key: NIFI-994
> URL: https://issues.apache.org/jira/browse/NIFI-994
> Project: Apache NiFi
> Issue Type: New Feature
> Affects Versions: 0.4.0
> Reporter: Joseph Percivall
> Assignee: Mark Payne
> Fix For: 0.4.0
>
> Attachments: 0001-NIFI-994-Initial-import-of-TailFile.patch
>
>
> It's a very common data ingest situation to want to input text into the
> system by "tailing" a file, most commonly log files. Currently we don't have
> an easy way to do this.
> A simple processor to tail a file would benefit many users. There would need
> to be an option to not just tail a file but pick up where the processor left
> off if it is interrupted.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)