Frederick Pletz created NIFI-7352:
-------------------------------------
Summary: Improve PutFile State Handling
Key: NIFI-7352
URL: https://issues.apache.org/jira/browse/NIFI-7352
Project: Apache NiFi
Issue Type: Improvement
Components: Core Framework
Reporter: Frederick Pletz
Currently PutFile has three conflict resolution states: REPLACE, IGNORE, FAIL.
REPLACE writes the new file to disk over the old file and transfers the file to
SUCCESS. FAIL does not replace the file on disk and transfers the file to
FAIL. IGNORE does not replace the file on disk and transfers the file to
SUCCESS. This breakout is less than useful, it is actively inviting
misunderstanding and miss-use. It is very easy to assume IGNORE would instead
have the following behavior: write to disk, but keep both original and new file
by appending notation information to the end of the filename - similar to how
filename conflicts are handled in other programs. I have personal experience
with this misinterpretation causing a project to drop data for an extended
period of time without realizing it. Additionally, the FAIL state is not
optimally useful in its current state as it is indistinguishable from other
failure states, such as folder does not exist or lack of write permissions.
Desired result: there should be a way to key off a greater degree of detail
from a PutFile processor. The easiest from a user perspective would be
correcting the output queues to include a "FAIL_DUPLICATE" output, opposed to a
single generic "FAIL" output. This would remove the need for "IGNORE", since
that function could be performed by using "FAIL_DUPLICATE" in the desired way -
most likely by auto-terminating that relationship. Barring that, an attribute
added to the flow file on output could give better indication of what happened
related to the success or failure of the processor - was it ignored? Written
to disk? if it failed, what was the failure: duplicate filename, write
permission, folder didn't exist?
A note toward backwards compatibility: I think the more likely result from the
NiFi team is the attribute route since it prevents breaking backwards
compatibility, however, I would caution that this also means teams which are
using "IGNORE" with an incorrect understanding of what that option means will
continue to be unaware they are dropping data.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)