[ 
https://issues.apache.org/jira/browse/NIFI-7352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17080613#comment-17080613
 ] 

Frederick Pletz commented on NIFI-7352:
---------------------------------------

"CRITICAL" priority justification: the way that the processor currently works 
is both resulting in dropping of data in organizations which misinterpret 
"IGNORE", and does not provide a reasonable way to catch and fix duplicate 
filename issues.  For example: if I were to route failure to an update 
attribute processor to append "_(#)" to the end of the filename, incriminating 
the number until a duplicate filename was not found then I would be opening 
myself to infinite un-handled loops since if the error was due to write 
permissions or missing folder then I would have no way to catch that state.  
Or, I would have to simply try a number of times before assuming either too 
many copies of the file existed or there was a non-filename related issue.

> Improve PutFile State Handling
> ------------------------------
>
>                 Key: NIFI-7352
>                 URL: https://issues.apache.org/jira/browse/NIFI-7352
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework
>            Reporter: Frederick Pletz
>            Priority: Critical
>              Labels: Processor, PutFile
>
> Currently PutFile has three conflict resolution states: REPLACE, IGNORE, 
> FAIL.  REPLACE writes the new file to disk over the old file and transfers 
> the file to SUCCESS.  FAIL does not replace the file on disk and transfers 
> the file to FAIL.  IGNORE does not replace the file on disk and transfers the 
> file to SUCCESS.  This breakout is less than useful, it is actively inviting 
> misunderstanding and miss-use.  It is very easy to assume IGNORE would 
> instead have the following behavior: write to disk, but keep both original 
> and new file by appending notation information to the end of the filename - 
> similar to how filename conflicts are handled in other programs.  I have 
> personal experience with this misinterpretation causing a project to drop 
> data for an extended period of time without realizing it.  Additionally, the 
> FAIL state is not optimally useful in its current state as it is 
> indistinguishable from other failure states, such as folder does not exist or 
> lack of write permissions.
>  
> Desired result: there should be a way to key off a greater degree of detail 
> from a PutFile processor.  The easiest from a user perspective would be 
> correcting the output queues to include a "FAIL_DUPLICATE" output, opposed to 
> a single generic "FAIL" output.  This would remove the need for "IGNORE", 
> since that function could be performed by using "FAIL_DUPLICATE" in the 
> desired way - most likely by auto-terminating that relationship.  Barring 
> that, an attribute added to the flow file on output could give better 
> indication of what happened related to the success or failure of the 
> processor - was it ignored?  Written to disk?  if it failed, what was the 
> failure: duplicate filename, write permission, folder didn't exist?
>  
> A note toward backwards compatibility: I think the more likely result from 
> the NiFi team is the attribute route since it prevents breaking backwards 
> compatibility, however, I would caution that this also means teams which are 
> using "IGNORE" with an incorrect understanding of what that option means will 
> continue to be unaware they are dropping data.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to