exceptionfactory commented on PR #8914:
URL: https://github.com/apache/nifi/pull/8914#issuecomment-2215330557

   > Thanks for looking at this PR, @exceptionfactory
   > 
   > > @mosermw can you provide some additional context for the reasons behind 
this proposed change in behavior?
   > 
   > As a use case, let's say I have a server running software that has an 
anxiety attack if a certain file doesn't exist, and it checks for that file 
every 5 seconds. I have a requirement to update that file periodically. I use 
the power of NiFi to generate the file contents and then PutSFTP the file into 
place. Currently, my server software sometimes panics because it checks for the 
file while I am updating it. This is because PutSFTP deletes the file first, 
transfers the file as ".filename" then renames to "filename". As my file gets 
larger, it can take more than 5 seconds to transfer.
   > 
   > After the change in this PR, the file will only not exist in the short 
period of time between an SFTP delete then rename.
   > 
   > I asked the question on Slack a while back and got positive feedback that 
this change would be useful. I should have put the link into the Jira ticket. 
https://apachenifi.slack.com/archives/C0L9S92JY/p1713799005100369
   > 
   > > Reviewing the code, this change introduces an additional `mlist()` 
command for FTP and `stat()` command for SFTP, for each file transferred 
through the corresponding Processors. Those commands require both file access 
and network communication, which could have an impact on high volume flows. Is 
there a particular reason for adding those calls prior to calling `delete()`? 
It seems like that should not be necessary.
   > 
   > You're absolutely right and I can do better. It doesn't hurt to call 
delete whether the destination file exists or not. I will modify the PR to 
remove those additional commands and test again.
   
   Thanks for the reply and additional background @mosermw, that is helpful.
   
   At the core, I agree the amount of time between uploading and renaming 
should be minimal. If you are able to rework the approach and avoid introducing 
the additional status calls, it seems like a viable improvement.
   
   It is worth noting, however, that having some other independent process 
checking for the existence of a file is still bound to result in a race 
condition. This change may reduce the chances, but it does not sound like a 
complete solution.
   
   So with that said, I will take another look when you have posted some 
updates.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to