[
https://issues.apache.org/jira/browse/NIFI-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14938244#comment-14938244
]
Joe Skora commented on NIFI-994:
--------------------------------
I think we are on the same page, but I left out some details. The key is that
the processor always starts at the beginning when it finds a file but discards
content it thinks was previously committed downstream.
One approach could be storing a checksum of processed content with the other
state when content is committed downstream. Files are always handled from the
start, but those that exist when the processor starts are checked against the
stored state. If the file has the same checksum at the same offset as the
state, the content up to the offset is discarded and the file is processed from
there on. If the checksum at the offset is different, all the content is
processed.
Any content that ages off while the Processor is stopped will be lost, but I
don't see a way around that. That said, it might be possible to recognize some
log rolling scenarios and finish processing rolled out files that were
previously in process while the regular behaviors pickup the new file.
> Processor to tail files
> -----------------------
>
> Key: NIFI-994
> URL: https://issues.apache.org/jira/browse/NIFI-994
> Project: Apache NiFi
> Issue Type: New Feature
> Affects Versions: 0.4.0
> Reporter: Joseph Percivall
> Assignee: Joseph Percivall
>
> It's a very common data ingest situation to want to input text into the
> system by "tailing" a file, most commonly log files. Currently we don't have
> an easy way to do this.
> A simple processor to tail a file would benefit many users. There would need
> to be an option to not just tail a file but pick up where the processor left
> off if it is interrupted.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)