For a NiFi processor, I think the "tail -F" makes more sense.  As opposed
to the normal behavior that follows existing file descriptors, "tail -F"
follows on filename (or pattern) so it tracks the current instance of a
file, letting it handle new files during the run, log rotations, etc..

I definitely agree that it should take a regex or a fixed filename.

I think the biggest question is granularity.  Though tail is normally a
line oriented operation, in NiFi it should probably be "chunk" oriented
with each pass creating a new flow file with whatever new full lines are
available.

On Wed, Sep 30, 2015 at 10:23 AM, Mark Payne (JIRA) <[email protected]> wrote:

>
>     [
> https://issues.apache.org/jira/browse/NIFI-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14936888#comment-14936888
> ]
>
> Mark Payne commented on NIFI-994:
> ---------------------------------
>
> Agreed. I'd recommend we allow the filename to tail to contain a * so that
> as things roll over we can still process the data. We could sort on last
> modified time to know the ordering of the files, and if we keep an offset
> into a file plus the timestamp when we pulled that file, that should help
> us to know which file it came from (the one with the smallest Last Modified
> timestamp >= our timestamp) and then we know which offset we left off at.
>
> If the data rolls off then you're right - there's nothing we can do about
> that. Would recommend we mention in the @CapabilityDescription that we
> expect logs to be kept around long enough to recover from outages.
>
>
> > Processor to tail files
> > -----------------------
> >
> >                 Key: NIFI-994
> >                 URL: https://issues.apache.org/jira/browse/NIFI-994
> >             Project: Apache NiFi
> >          Issue Type: New Feature
> >    Affects Versions: 0.4.0
> >            Reporter: Joseph Percivall
> >            Assignee: Joseph Percivall
> >
> > It's a very common data ingest situation to want to input text into the
> system by "tailing" a file, most commonly log files. Currently we don't
> have an easy way to do this.
> > A simple processor to tail a file would benefit many users. There would
> need to be an option to not just tail a file but pick up where the
> processor left off if it is interrupted.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>

Reply via email to