[
https://issues.apache.org/jira/browse/NIFI-631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984137#comment-14984137
]
Joe Skora commented on NIFI-631:
--------------------------------
[~mpetronic],
<facepalm>
Wow, thank you for catching this. I apologize, that was an incomplete version
of the processor.
<facepalm/>
When [~markap14] pointed out the new AbstractListProcessor was available, I
created a new smoke test of ListFile extending ALP. I was excited that it
worked so cleanly, and must have then lost my mind and committed without
re-integrating the rest of the goodness.
Your comments make sense, I'll update it with these changes if nothing else
pops out while working on it.
1. RecurseSubdirs - this is required.
2. Suppress directories in output - they're called flow FILES for a reason.
2. MinimumTimeStamp - (seed in your notes) I'll make it a static primary date
filter for file selection and the dynamic LastModified high water mark will
still track against those files actually processed. This way the interaction
between the static and dynamic cutoffs is clear and there's no need to reset
the MinimumTimeStamp.
3. File LastModified in attributes - seems useful and easy to add, it's already
captured during scanning.
4. Annotations and documentation - I'll put them back and fix the description.
So sorry, I'll try to turn this around ASAP.
> Create ListFile and FetchFile processors
> ----------------------------------------
>
> Key: NIFI-631
> URL: https://issues.apache.org/jira/browse/NIFI-631
> Project: Apache NiFi
> Issue Type: Improvement
> Reporter: Mark Payne
> Assignee: Joe Skora
> Attachments:
> 0001-NIFI-631-Initial-implementation-of-FetchFile-process.patch
>
>
> This pair of Processors will provide several benefits over the existing
> GetFile processor:
> 1. Currently, GetFile will continually pull the same files if the "Keep
> Source File" property is set to true. There is no way to pull the file and
> leave it in the directory without continually pulling the same file. We could
> implement state here, but it would either be a huge amount of state to
> remember everything pulled or it would have to always pull the oldest file
> first so that we can maintain just the Last Modified Date of the last file
> pulled plus all files with the same Last Modified Date that have already been
> pulled.
> 2. If pulling from a network attached storage such as NFS, this would allow a
> single processor to run ListFiles and then distribute those FlowFiles to the
> cluster so that the cluster can share the work of pulling the data.
> 3. There are use cases when we may want to pull a specific file (for example,
> in conjunction with ProcessHttpRequest/ProcessHttpResponse) rather than just
> pull all files in a directory. GetFile does not support this.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)