Hi Peter, took me some time to understand your suggestion. Great, thank you! Have a great day and take care.Best,Lars On Wed, 2019-07-31 at 17:53 +0000, Peter Wicks (pwicks) wrote: > Lars, > If you are worried about it, using ReplaceText will have the same effect as > your custom solution. When ReplaceText has > it's `Replacement Strategy` set to `Always Replace` it doesn't read the > contents of the FlowFile and simply writes out > the replacement Value, which in your case could be an empty string. > Thanks, Peter > From: Lars Winderling <lars.winderl...@posteo.de>Sent: Wednesday, July 31, > 2019 11:02 AMTo: dev@nifi.apache.org > Subject: [EXT] Re: Duplicate flow files *without* their content > Hi Edward, > thank you for your input. I didn't know about the cow-semantics, that's > really useful. I'll check out the in-depth > guide for sure!In my case, the content of the flow file does change heavily > from one processor to the next one, so I > doubt copy-on-write would help here. > Best,Lars > On Wed, 2019-07-31 at 12:13 +0100, Edward Armes wrote: > HI Lars, > > > In short. depending on the how a FlowFile is duplicated, the content > shouldn't be duplicated as well. > > > In general, content is only duplicated when it has been deemed to have been > changed (copy-on-write semantics). For the most part (unless a FlowFIle has > a large number of attributes) a FlowFile is actually quite small and > therefore the waste is minimal, hence why they can be held in memory and > passed through a Flow. > > > The best way to branch/clone a flow file is to add another output from the > processor you want to log the output from, and the Framework that surrounds > a Processor will handle the rest. This does create a duplicate FlowFIle but > doesn't create a copy of the content. In the provenance repository this > marked as a CLONE event for the original FlowFIle and the new FlowFile gets > treated as it's own unique FlowFIle with a reference to the original > content. > > > This is quite a short explanation, and a better and more in depth > explanation can be found here and I think this covers all the scenarios > you're thinking > about:<https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html> > https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html > > > . > > > > > Edward > > > On Wed, Jul 31, 2019 at 11:47 AM Lars Winderling > <<mailto:lars.winderl...@posteo.de> > lars.winderl...@posteo.de<mailto:lars.winderl...@posteo.de> > > > wrote: > > > Dear NiFi community, > > > I often face the use-case where I import flow files with content of order > O(1gb) or O(10gb) - already compressed. > Let's day I need to branch off of a flow where the actual flow file should > be processed further, and one some side branch I want just to do some kind > of logging or whatever without accessing the flow file's contents. Thus > it's clearly wasteful to duplicate the flow file including content. > For this case I wrote a processor defining 2 relationships: "original" and > "attributes only", so the flow file attributes can be accessed separately > from the content. > I will gladly prepare a PR if anyone finds that worth incorporating into > NiFi. > Only remaining question for me would be: use an individual processor to > that end, or add it to e.g. the DuplicateFlowFile processor. The former > seems cleaner to me. Proposed names would be something like ForkProcessor > (no better idea yet). > > > Thanks in advance! > Best, > Lars >
signature.asc
Description: This is a digitally signed message part