For a NiFi processor, I think the "tail -F" makes more sense. As opposed to the normal behavior that follows existing file descriptors, "tail -F" follows on filename (or pattern) so it tracks the current instance of a file, letting it handle new files during the run, log rotations, etc..
I definitely agree that it should take a regex or a fixed filename. I think the biggest question is granularity. Though tail is normally a line oriented operation, in NiFi it should probably be "chunk" oriented with each pass creating a new flow file with whatever new full lines are available. On Wed, Sep 30, 2015 at 10:23 AM, Mark Payne (JIRA) <[email protected]> wrote: > > [ > https://issues.apache.org/jira/browse/NIFI-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14936888#comment-14936888 > ] > > Mark Payne commented on NIFI-994: > --------------------------------- > > Agreed. I'd recommend we allow the filename to tail to contain a * so that > as things roll over we can still process the data. We could sort on last > modified time to know the ordering of the files, and if we keep an offset > into a file plus the timestamp when we pulled that file, that should help > us to know which file it came from (the one with the smallest Last Modified > timestamp >= our timestamp) and then we know which offset we left off at. > > If the data rolls off then you're right - there's nothing we can do about > that. Would recommend we mention in the @CapabilityDescription that we > expect logs to be kept around long enough to recover from outages. > > > > Processor to tail files > > ----------------------- > > > > Key: NIFI-994 > > URL: https://issues.apache.org/jira/browse/NIFI-994 > > Project: Apache NiFi > > Issue Type: New Feature > > Affects Versions: 0.4.0 > > Reporter: Joseph Percivall > > Assignee: Joseph Percivall > > > > It's a very common data ingest situation to want to input text into the > system by "tailing" a file, most commonly log files. Currently we don't > have an easy way to do this. > > A simple processor to tail a file would benefit many users. There would > need to be an option to not just tail a file but pick up where the > processor left off if it is interrupted. > > > > -- > This message was sent by Atlassian JIRA > (v6.3.4#6332) >
