Jens For such a setup the very specific details matter and here there are a lot of details. It isn't easy to sort through this for me so I'll keep it high level based on my experience in very similar situations/setups:
1. I'd generally trust SFTP to be awesome and damn near failure proof in itself. I'd focus on other things. 2. I'd generally trust that data packet corruption in terms of network transfer is bulletproof and not think that is a problem especially since SFTP and various protocols employed here offer certain guarantees themselves (including nifi). 3. I'd be suspect of one way transfer/guard devices creating issues. I'd remove that and try to reproduce the problem. 4. In linux a cp/mv is not atomic as I understand if data is spanning across file systems so you could have partially written data scenarios here potentially. 5. I'd be careful to avoid multiple file scenarios such as original content and the sha256. Instead if the low side is a NiFi and the high side is a NiFi I'd have lowside nifi write out flowfiles and pass those over the guard device. Why? Because this gives you your original content AND the flowfile attributes (where I'd have the sha256). On the high side nifi i'd unpack that flow file and ensure the content matches the stated sha256. Joe On Tue, Oct 12, 2021 at 12:25 PM Jens M. Kofoed <jmkofoed....@gmail.com> wrote: > > Hi Joe > > I know what you are thinking but that’s not the case. > Check my very short description of my test flow. > In my loop the PutSFTP process is using default settings which means it’s > uploading files as .filename and rename it when done. The next process is the > FetchSFTP which will load the file as filename. If PutSFTP is not finished > uploading the file it will have the wrong filename and the flow file will not > go from the PutSFTP -> FetchSFTP and therefore the FetchSFTP can’t fetch the > file. So in my test flow it is not the case. > > In our production flow, after nifi gets its data it calculates the sha256. > uploads the data to a sftp server as .filename and rename it when done. > Default settings for PutSFTP. Next it create a new file with the value of the > hash and save it as filename.sha256. > At that sftp server a bash script is looking for NOT hidden files every 2 > seconds with a ls command. If there are files the bash script does a cp > filename /archive/filename and sends the data to server 3 via a data diode. > At the other side another nifi server reads the filename.sha256, reads in the > hash value and reads in the original data. Calculate a new sha256 and compare > the two hashes. > Yesterday there was a corruption again and we checked the file at the first > sftp server where the first nifi saved it after creating the first hash. > Running a sha256sum at the /archive/filename produced a different hash than > nifi. So after the PutSFTP and a Linux cp command the file was corrupted. > It have been less than 1 file pr. 1.000.000 files where we have seen theses > issues. But we see them. > Now we try to investigate that course the issue. Therefore I created the > small test flow and already after nearly 9000 iteration in the loop the file > has been corrupted just being uploaded and downloaded again. > > Are we facing a network issue where a data packed is corrupted? > Are there a very rare cases where the sftp implementation is doing something > wrong? > We don’t know yet but we are running some more tests and at different systems > to narrow it down > > Kind regards > Jens M. Kofoed > > > Den 12. okt. 2021 kl. 19.39 skrev Joe Witt <joe.w...@gmail.com>: > > > > Hello > > > > How does nifi grab the data from the file system? It sounds like it is > > doing partial reads due to a competing consumer (data still being written) > > scenario. > > > > Thanks > > > > On Mon, Oct 11, 2021 at 10:36 PM Jens M. Kofoed <jmkofoed....@gmail.com> > > wrote: > > > >> Dear Developers > >> > >> We have a situation where we see corrupted file after using PutSFTP and > >> FetchSFTP in NIFI 1.13.2 with openjdk version "1.8.0_292", OpenJDK Runtime > >> Environment (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK 64-Bit > >> Server VM (build 25.292-b10, mixed mode) running on a Ubuntu Server 20.04 > >> > >> We have a flow between 2 separated systems where we use a PUTSFTP to export > >> data from one NIFI instance to a datadiode and use FetchSFTP to grep data > >> on the other end. To be sure data is not corrupted we calculate a SHA256 on > >> each side, and transfer the flowfile metadata in a seperate file. In rare > >> cases have see that the SHA256 doesn't match on both sides and are > >> investigation where the errors happens. We see 2 errors. Manually > >> calculation a SHA256 on both side of the diodes the file is OK and we have > >> found that the errors at happens between NIFI and the SFTP servers. And it > >> can happens at both sides. > >> So for testing I created this little flow: > >> GeneratingFlowFile (size 100MB) (Run once) -> > >> CryptographicHashContent (SHA256) -> > >> UpdateAttribute ( hash.root = ${content_SHA-256} , iteration=1) -> > >> PutSFTP -> > >> FetchSFTP -> > >> CryptographicHashContent (SHA256) -> > >> routeOnAttribute (compare root.hash vs.content_SHA-256) > >> If unmatch -> > >> Going to a disabled process for placeholding the corrupted file in > >> a file queue > >> If match -> > >> UpdateAttribute ( iteration= ${iteration:plus(1)} ) -> looping back > >> to PutSFTP > >> > >> After 8992 iteration the file is corrupted. To test if the errors are in > >> the calculation of the SHA256 I have a copy of the flow without the > >> PUT/FETCH SFTP processors which haven't got any errors yet. > >> > >> It is very rare that we see these errors, millions of files are going > >> through without any issues but some time it happens which is not good. > >> > >> Can any one please help? Maybe trying to setup the same test and see if you > >> also have a corrupted file after some days. > >> > >> Kind regards > >> Jens M. Kofoed > >>