[
https://issues.apache.org/jira/browse/NIFI-7352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17080613#comment-17080613
]
Frederick Pletz commented on NIFI-7352:
---------------------------------------
"CRITICAL" priority justification: the way that the processor currently works
is both resulting in dropping of data in organizations which misinterpret
"IGNORE", and does not provide a reasonable way to catch and fix duplicate
filename issues. For example: if I were to route failure to an update
attribute processor to append "_(#)" to the end of the filename, incriminating
the number until a duplicate filename was not found then I would be opening
myself to infinite un-handled loops since if the error was due to write
permissions or missing folder then I would have no way to catch that state.
Or, I would have to simply try a number of times before assuming either too
many copies of the file existed or there was a non-filename related issue.
> Improve PutFile State Handling
> ------------------------------
>
> Key: NIFI-7352
> URL: https://issues.apache.org/jira/browse/NIFI-7352
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Core Framework
> Reporter: Frederick Pletz
> Priority: Critical
> Labels: Processor, PutFile
>
> Currently PutFile has three conflict resolution states: REPLACE, IGNORE,
> FAIL. REPLACE writes the new file to disk over the old file and transfers
> the file to SUCCESS. FAIL does not replace the file on disk and transfers
> the file to FAIL. IGNORE does not replace the file on disk and transfers the
> file to SUCCESS. This breakout is less than useful, it is actively inviting
> misunderstanding and miss-use. It is very easy to assume IGNORE would
> instead have the following behavior: write to disk, but keep both original
> and new file by appending notation information to the end of the filename -
> similar to how filename conflicts are handled in other programs. I have
> personal experience with this misinterpretation causing a project to drop
> data for an extended period of time without realizing it. Additionally, the
> FAIL state is not optimally useful in its current state as it is
> indistinguishable from other failure states, such as folder does not exist or
> lack of write permissions.
>
> Desired result: there should be a way to key off a greater degree of detail
> from a PutFile processor. The easiest from a user perspective would be
> correcting the output queues to include a "FAIL_DUPLICATE" output, opposed to
> a single generic "FAIL" output. This would remove the need for "IGNORE",
> since that function could be performed by using "FAIL_DUPLICATE" in the
> desired way - most likely by auto-terminating that relationship. Barring
> that, an attribute added to the flow file on output could give better
> indication of what happened related to the success or failure of the
> processor - was it ignored? Written to disk? if it failed, what was the
> failure: duplicate filename, write permission, folder didn't exist?
>
> A note toward backwards compatibility: I think the more likely result from
> the NiFi team is the attribute route since it prevents breaking backwards
> compatibility, however, I would caution that this also means teams which are
> using "IGNORE" with an incorrect understanding of what that option means will
> continue to be unaware they are dropping data.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)