[
https://issues.apache.org/jira/browse/NIFI-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14980928#comment-14980928
]
Joe Skora commented on NIFI-994:
--------------------------------
[~markap14] Logging is a very likely use case for this processor, creating the
possibility of the log rolling over before the processor reaches the end,
losing the unprocessed portion if it isn't duplicated before processing. That
being the case, I'm inclined to favor performance and simplicity over accuracy.
Calculating the checksum while reading a file won't be bad, but re-reading the
whole file on subsequent triggers could get expensive. For example, processing
a 6MB log file in 6 parts could mean processing 21MB of data (1+2+3+4+5+6) and
it grows geometrically (IIRC) from there.
It will be important to make sure people know they may not be getting all the
data if only the open log file can be processed. The only sure way I see to
get 100% coverage of a log file is to only process files that have rotated out
and are no longer active.
My 2 cents. YMMV.
> Processor to tail files
> -----------------------
>
> Key: NIFI-994
> URL: https://issues.apache.org/jira/browse/NIFI-994
> Project: Apache NiFi
> Issue Type: New Feature
> Affects Versions: 0.4.0
> Reporter: Joseph Percivall
> Assignee: Mark Payne
> Fix For: 0.4.0
>
> Attachments: 0001-NIFI-994-Initial-import-of-TailFile.patch,
> 0002-NIFI-994-Ensure-that-processor-is-not-valid-due-to-t.patch
>
>
> It's a very common data ingest situation to want to input text into the
> system by "tailing" a file, most commonly log files. Currently we don't have
> an easy way to do this.
> A simple processor to tail a file would benefit many users. There would need
> to be an option to not just tail a file but pick up where the processor left
> off if it is interrupted.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)