Jens If you use MergContent [1] you can create streams of flowfile bundles (attributes/content serialized together) in groups of 1 or more. Then on the other end you can use UnpackContent [2]
Thanks Joe [1] http://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.14.0/org.apache.nifi.processors.standard.MergeContent/index.html [2] http://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.14.0/org.apache.nifi.processors.standard.UnpackContent/index.html On Tue, Oct 12, 2021 at 11:07 PM Jens M. Kofoed <jmkofoed....@gmail.com> wrote: > > Dear Joe > > Regarding you point 5. This is almost also what I'm doing. But last night > at my phone I "just wrote" we created a hash file. What I'm actually doing > is converting the flowfile to json. > Are there a way where NIFI can export the complete flowfile (attributes and > content) into 1 file, which we can import again on the other side? Right > now I do it in 2 steps > Below is a short description of my flow for transferring data between > systems where we can't use S2S. > At low side: > get data -> > CryptographicHashContent -> > UpdateAttribute: original.filename = ${filename}, > rootHash=${content_SHA-256} -> > UpdateAttribute: filename=${UUID()} -> > PutSFTP -> > AttributesToJSON: Destination=flowfile-content -> > UpdateAttribute: filename=${filename:append('.flowfile')} -> > PutSFTP > > At high side: > ListSFTP: File filter Regex = .*\.flowfile -> > FetchSFTP -> > ExecuteScript: (converting json data into attributes) -> > UpdateAttribute: filename = ${filename:substringBefore('.flowfile')} > -> > FetchSFTP -> > CryptographicHashContent -> > RouteOnAttribute: Hash_OK = > ${rootHash:equals(${content_SHA-256})} -> > Hash_OK -> following production flow > Unmatched -> Error flow > > Kind regards > Jens > > Den tir. 12. okt. 2021 kl. 21.36 skrev Joe Witt <joe.w...@gmail.com>: > > > Jens > > > > For such a setup the very specific details matter and here there are a > > lot of details. It isn't easy to sort through this for me so I'll > > keep it high level based on my experience in very similar > > situations/setups: > > > > 1. I'd generally trust SFTP to be awesome and damn near failure proof > > in itself. I'd focus on other things. > > 2. I'd generally trust that data packet corruption in terms of network > > transfer is bulletproof and not think that is a problem especially > > since SFTP and various protocols employed here offer certain > > guarantees themselves (including nifi). > > 3. I'd be suspect of one way transfer/guard devices creating issues. > > I'd remove that and try to reproduce the problem. > > 4. In linux a cp/mv is not atomic as I understand if data is spanning > > across file systems so you could have partially written data scenarios > > here potentially. > > 5. I'd be careful to avoid multiple file scenarios such as original > > content and the sha256. Instead if the low side is a NiFi and the > > high side is a NiFi I'd have lowside nifi write out flowfiles and pass > > those over the guard device. Why? Because this gives you your > > original content AND the flowfile attributes (where I'd have the > > sha256). On the high side nifi i'd unpack that flow file and ensure > > the content matches the stated sha256. > > > > Joe > > > > On Tue, Oct 12, 2021 at 12:25 PM Jens M. Kofoed <jmkofoed....@gmail.com> > > wrote: > > > > > > Hi Joe > > > > > > I know what you are thinking but that’s not the case. > > > Check my very short description of my test flow. > > > In my loop the PutSFTP process is using default settings which means > > it’s uploading files as .filename and rename it when done. The next process > > is the FetchSFTP which will load the file as filename. If PutSFTP is not > > finished uploading the file it will have the wrong filename and the flow > > file will not go from the PutSFTP -> FetchSFTP and therefore the FetchSFTP > > can’t fetch the file. So in my test flow it is not the case. > > > > > > In our production flow, after nifi gets its data it calculates the > > sha256. uploads the data to a sftp server as .filename and rename it when > > done. Default settings for PutSFTP. Next it create a new file with the > > value of the hash and save it as filename.sha256. > > > At that sftp server a bash script is looking for NOT hidden files every > > 2 seconds with a ls command. If there are files the bash script does a cp > > filename /archive/filename and sends the data to server 3 via a data diode. > > At the other side another nifi server reads the filename.sha256, reads in > > the hash value and reads in the original data. Calculate a new sha256 and > > compare the two hashes. > > > Yesterday there was a corruption again and we checked the file at the > > first sftp server where the first nifi saved it after creating the first > > hash. Running a sha256sum at the /archive/filename produced a different > > hash than nifi. So after the PutSFTP and a Linux cp command the file was > > corrupted. > > > It have been less than 1 file pr. 1.000.000 files where we have seen > > theses issues. But we see them. > > > Now we try to investigate that course the issue. Therefore I created the > > small test flow and already after nearly 9000 iteration in the loop the > > file has been corrupted just being uploaded and downloaded again. > > > > > > Are we facing a network issue where a data packed is corrupted? > > > Are there a very rare cases where the sftp implementation is doing > > something wrong? > > > We don’t know yet but we are running some more tests and at different > > systems to narrow it down > > > > > > Kind regards > > > Jens M. Kofoed > > > > > > > Den 12. okt. 2021 kl. 19.39 skrev Joe Witt <joe.w...@gmail.com>: > > > > > > > > Hello > > > > > > > > How does nifi grab the data from the file system? It sounds like it is > > > > doing partial reads due to a competing consumer (data still being > > written) > > > > scenario. > > > > > > > > Thanks > > > > > > > > On Mon, Oct 11, 2021 at 10:36 PM Jens M. Kofoed < > > jmkofoed....@gmail.com> > > > > wrote: > > > > > > > >> Dear Developers > > > >> > > > >> We have a situation where we see corrupted file after using PutSFTP > > and > > > >> FetchSFTP in NIFI 1.13.2 with openjdk version "1.8.0_292", OpenJDK > > Runtime > > > >> Environment (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK > > 64-Bit > > > >> Server VM (build 25.292-b10, mixed mode) running on a Ubuntu Server > > 20.04 > > > >> > > > >> We have a flow between 2 separated systems where we use a PUTSFTP to > > export > > > >> data from one NIFI instance to a datadiode and use FetchSFTP to grep > > data > > > >> on the other end. To be sure data is not corrupted we calculate a > > SHA256 on > > > >> each side, and transfer the flowfile metadata in a seperate file. In > > rare > > > >> cases have see that the SHA256 doesn't match on both sides and are > > > >> investigation where the errors happens. We see 2 errors. Manually > > > >> calculation a SHA256 on both side of the diodes the file is OK and we > > have > > > >> found that the errors at happens between NIFI and the SFTP servers. > > And it > > > >> can happens at both sides. > > > >> So for testing I created this little flow: > > > >> GeneratingFlowFile (size 100MB) (Run once) -> > > > >> CryptographicHashContent (SHA256) -> > > > >> UpdateAttribute ( hash.root = ${content_SHA-256} , iteration=1) -> > > > >> PutSFTP -> > > > >> FetchSFTP -> > > > >> CryptographicHashContent (SHA256) -> > > > >> routeOnAttribute (compare root.hash vs.content_SHA-256) > > > >> If unmatch -> > > > >> Going to a disabled process for placeholding the corrupted > > file in > > > >> a file queue > > > >> If match -> > > > >> UpdateAttribute ( iteration= ${iteration:plus(1)} ) -> looping > > back > > > >> to PutSFTP > > > >> > > > >> After 8992 iteration the file is corrupted. To test if the errors are > > in > > > >> the calculation of the SHA256 I have a copy of the flow without the > > > >> PUT/FETCH SFTP processors which haven't got any errors yet. > > > >> > > > >> It is very rare that we see these errors, millions of files are going > > > >> through without any issues but some time it happens which is not good. > > > >> > > > >> Can any one please help? Maybe trying to setup the same test and see > > if you > > > >> also have a corrupted file after some days. > > > >> > > > >> Kind regards > > > >> Jens M. Kofoed > > > >> > >