Jens

If you use MergContent [1] you can create streams of flowfile bundles
(attributes/content serialized together) in groups of 1 or more.  Then
on the other end you can use UnpackContent [2]

Thanks
Joe

[1] 
http://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.14.0/org.apache.nifi.processors.standard.MergeContent/index.html
[2] 
http://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.14.0/org.apache.nifi.processors.standard.UnpackContent/index.html

On Tue, Oct 12, 2021 at 11:07 PM Jens M. Kofoed <jmkofoed....@gmail.com> wrote:
>
> Dear Joe
>
> Regarding you point 5. This is almost also what I'm doing. But last night
> at my phone I "just wrote" we created a hash file. What I'm actually doing
> is converting the flowfile to json.
> Are there a way where NIFI can export the complete flowfile (attributes and
> content) into 1 file, which we can import again on the other side? Right
> now I do it in 2 steps
> Below is a short description of my flow for transferring data between
> systems where we can't use S2S.
> At low side:
> get data ->
>   CryptographicHashContent ->
>     UpdateAttribute: original.filename = ${filename},
> rootHash=${content_SHA-256} ->
>       UpdateAttribute: filename=${UUID()} ->
>         PutSFTP ->
>           AttributesToJSON: Destination=flowfile-content ->
>             UpdateAttribute: filename=${filename:append('.flowfile')} ->
>               PutSFTP
>
> At high side:
> ListSFTP: File filter Regex = .*\.flowfile ->
>   FetchSFTP ->
>     ExecuteScript: (converting json data into attributes) ->
>       UpdateAttribute: filename = ${filename:substringBefore('.flowfile')}
> ->
>         FetchSFTP ->
>           CryptographicHashContent ->
>             RouteOnAttribute: Hash_OK =
> ${rootHash:equals(${content_SHA-256})} ->
>               Hash_OK -> following production flow
>               Unmatched -> Error flow
>
> Kind regards
> Jens
>
> Den tir. 12. okt. 2021 kl. 21.36 skrev Joe Witt <joe.w...@gmail.com>:
>
> > Jens
> >
> > For such a setup the very specific details matter and here there are a
> > lot of details.  It isn't easy to sort through this for me so I'll
> > keep it high level based on my experience in very similar
> > situations/setups:
> >
> > 1. I'd generally trust SFTP to be awesome and damn near failure proof
> > in itself.  I'd focus on other things.
> > 2. I'd generally trust that data packet corruption in terms of network
> > transfer is bulletproof and not think that is a problem especially
> > since SFTP and various protocols employed here offer certain
> > guarantees themselves (including nifi).
> > 3. I'd be suspect of one way transfer/guard devices creating issues.
> > I'd remove that and try to reproduce the problem.
> > 4. In linux a cp/mv is not atomic as I understand if data is spanning
> > across file systems so you could have partially written data scenarios
> > here potentially.
> > 5. I'd be careful to avoid multiple file scenarios such as original
> > content and the sha256.  Instead if the low side is a NiFi and the
> > high side is a NiFi I'd have lowside nifi write out flowfiles and pass
> > those over the guard device.  Why?  Because this gives you your
> > original content AND the flowfile attributes (where I'd have the
> > sha256).  On the high side nifi i'd unpack that flow file and ensure
> > the content matches the stated sha256.
> >
> > Joe
> >
> > On Tue, Oct 12, 2021 at 12:25 PM Jens M. Kofoed <jmkofoed....@gmail.com>
> > wrote:
> > >
> > > Hi Joe
> > >
> > > I know what you are thinking but that’s not the case.
> > > Check my very short description of my test flow.
> > > In my loop the PutSFTP process is using default settings which means
> > it’s uploading files as .filename and rename it when done. The next process
> > is the FetchSFTP which will load the file as filename. If PutSFTP is not
> > finished uploading the file it will have the wrong filename and the flow
> > file will not go from the PutSFTP -> FetchSFTP and therefore the FetchSFTP
> > can’t fetch the file. So in my test flow it is not the case.
> > >
> > > In our production flow, after nifi gets its data it calculates the
> > sha256.  uploads the data to a sftp server as .filename and rename it when
> > done. Default settings for PutSFTP. Next it create a new file with the
> > value of the hash and save it as filename.sha256.
> > >  At that sftp server a bash script is looking for NOT hidden files every
> > 2 seconds with a ls command. If there are files the bash script does a cp
> > filename /archive/filename and sends the data to server 3 via a data diode.
> > At the other side another nifi server reads the filename.sha256, reads in
> > the hash value and reads in the original data. Calculate a new sha256 and
> > compare the two hashes.
> > > Yesterday there was a corruption again and we checked the file at the
> > first sftp server where the first nifi saved it after creating the first
> > hash. Running a sha256sum at the /archive/filename produced a different
> > hash than nifi. So after the PutSFTP and a Linux cp command the file was
> > corrupted.
> > > It have been less than 1 file pr. 1.000.000 files where we have seen
> > theses issues. But we see them.
> > > Now we try to investigate that course the issue. Therefore I created the
> > small test flow and already after nearly 9000 iteration in the loop the
> > file has been corrupted just being uploaded and downloaded again.
> > >
> > > Are we facing a network issue where a data packed is corrupted?
> > > Are there a very rare cases where the sftp implementation is doing
> > something wrong?
> > > We don’t know yet but we are running some more tests and at different
> > systems to narrow it down
> > >
> > > Kind regards
> > > Jens M. Kofoed
> > >
> > > > Den 12. okt. 2021 kl. 19.39 skrev Joe Witt <joe.w...@gmail.com>:
> > > >
> > > > Hello
> > > >
> > > > How does nifi grab the data from the file system?  It sounds like it is
> > > > doing partial reads due to a competing consumer (data still being
> > written)
> > > > scenario.
> > > >
> > > > Thanks
> > > >
> > > > On Mon, Oct 11, 2021 at 10:36 PM Jens M. Kofoed <
> > jmkofoed....@gmail.com>
> > > > wrote:
> > > >
> > > >> Dear Developers
> > > >>
> > > >> We have a situation where we see corrupted file after using PutSFTP
> > and
> > > >> FetchSFTP in NIFI 1.13.2 with openjdk version "1.8.0_292", OpenJDK
> > Runtime
> > > >> Environment (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK
> > 64-Bit
> > > >> Server VM (build 25.292-b10, mixed mode) running on a Ubuntu Server
> > 20.04
> > > >>
> > > >> We have a flow between 2 separated systems where we use a PUTSFTP to
> > export
> > > >> data from one NIFI instance to a datadiode and use FetchSFTP to grep
> > data
> > > >> on the other end. To be sure data is not corrupted we calculate a
> > SHA256 on
> > > >> each side, and transfer the flowfile metadata in a seperate file. In
> > rare
> > > >> cases have see that the SHA256 doesn't match on both sides and are
> > > >> investigation where the errors happens. We see 2 errors. Manually
> > > >> calculation a SHA256 on both side of the diodes the file is OK and we
> > have
> > > >> found that the errors at  happens between NIFI and the SFTP servers.
> > And it
> > > >> can happens at both sides.
> > > >> So for testing I created this little flow:
> > > >> GeneratingFlowFile (size 100MB) (Run once) ->
> > > >> CryptographicHashContent (SHA256) ->
> > > >> UpdateAttribute ( hash.root = ${content_SHA-256} , iteration=1) ->
> > > >> PutSFTP ->
> > > >> FetchSFTP ->
> > > >> CryptographicHashContent (SHA256) ->
> > > >> routeOnAttribute (compare root.hash vs.content_SHA-256)
> > > >>    If unmatch ->
> > > >>        Going to a disabled process for placeholding the corrupted
> > file in
> > > >> a file queue
> > > >>    If match ->
> > > >>        UpdateAttribute ( iteration= ${iteration:plus(1)} ) -> looping
> > back
> > > >> to PutSFTP
> > > >>
> > > >> After 8992 iteration the file is corrupted. To test if the errors are
> > in
> > > >> the calculation of the SHA256 I have a copy of the flow without the
> > > >> PUT/FETCH SFTP processors which haven't got any errors yet.
> > > >>
> > > >> It is very rare that we see these errors, millions of files are going
> > > >> through without any issues but some time it happens which is not good.
> > > >>
> > > >> Can any one please help? Maybe trying to setup the same test and see
> > if you
> > > >> also have a corrupted file after some days.
> > > >>
> > > >> Kind regards
> > > >> Jens M. Kofoed
> > > >>
> >

Reply via email to