I will make a ticket for this. Thanks! On Thu, May 23, 2024 at 12:45 PM Mark Payne <marka...@hotmail.com> wrote:
> Hey Dan, > > Yessir, it absolutely would! Probably would be good to clean those up. > > Thanks > -Mark > > > > On May 23, 2024, at 12:29 PM, Dan S <dsti...@gmail.com> wrote: > > > > Mark, > > In regards to your last comments > > > >> Not really related, but on the note of things you may not realize, with > >> that code snippet :) > >> If you have access to update the processor mentioned here, you should > >> avoid using session.putAttribute many times. > >> Under the hood, in order to maintain object immutability it has to > create > >> a new FlowFile object (and a new HashMap of all attributes!) > >> for every call to putAttribute. So if there are 100 attributes that > match > >> that’s potentially a huge amount of garbage getting created. > > > > > > I noticed that some of the split processors SplitJson, SplitXml, and > > SplitAvro all have loops to create a new flow file for each split and it > > calls putAttribute more than once (in order to populate the split > > attributes FRAGMENT_ID, FRAGMENT_INDEX etc) for each flow file created. > > Would this all suffer from "a huge amount of garbage getting created" > > since putAttribute is called multiple times for each iteration of the > loop? > > > > On Wed, May 8, 2024 at 5:27 PM Michael Moser <moser...@gmail.com> wrote: > > > >> Oh yeah, I do love this behavior of ProcessSession. And thanks for the > >> tip, it's easy to forget that there are efficiencies to be gained by > using > >> different parts of an API. > >> > >> -- Mke > >> > >> > >> On Wed, May 8, 2024 at 11:04 AM Mark Payne <marka...@hotmail.com> > wrote: > >> > >>> Yeah, that was something that kinda flew under the radar. Definitely > >>> improved the API. > >>> > >>> Not really related, but on the note of things you may not realize, with > >>> that code snippet :) > >>> If you have access to update the processor mentioned here, you should > >>> avoid using session.putAttribute many times. > >>> Under the hood, in order to maintain object immutability it has to > create > >>> a new FlowFile object (and a new HashMap of all attributes!) > >>> for every call to putAttribute. So if there are 100 attributes that > match > >>> that’s potentially a huge amount of garbage getting created. Instead, > >>> you could just use: > >>> > >>> ``` > >>> final Map<String, String> attributes = new HashMap<>(); > >>> flowFile.getAttributes().forEach( (key, value) -> { > >>> if (key.startsWith(“foo”)) { > >>> attributes.put(“original-“ + key, value); > >>> } > >>> } > >>> > >>> flowFIle = session.putAllAttributes(flowFile, attributes); > >>> ``` > >>> > >>> Thanks > >>> -Mark > >>> > >>> > >>> > >>>> On May 8, 2024, at 10:30 AM, Michael Moser <moser...@gmail.com> > wrote: > >>>> > >>>> Wow, thanks for this information! Just last week I saw code that > >>> modified > >>>> attributes in a stream: > >>>> > >>>> flowFile.getAttributes().entrySet().stream().filter(e -> > >>>> e.getKey().startsWith("foo")) > >>>> .forEach(e -> session.putAttribute(flowFile, "original-" + e.getKey, > >>>> e.getValue())); > >>>> > >>>> and I wondered how that could possibly work since the return value of > >>>> session.putAttribute is ignored! Now I know. > >>>> > >>>> -- Mike > >>>> > >>>> On Tue, May 7, 2024 at 3:02 PM Russell Bateman <r...@windofkeltia.com > > > >>>> wrote: > >>>> > >>>>> Yes, what you described is what was happening, Mark. I didn't display > >>>>> all of the code to the session methods, and I did re-read the > >> in-coming > >>>>> flowfile for different purposes than I had already read and written > >> it. > >>>>> So, I wasn't helpful enough. In the end, however, I had forgotten, > >>>>> immediately after the call to session.putAllAttributes(), to update > >> the > >>>>> resulting flowfile for passing to session.transfer(). That solved it > >> for > >>>>> 1.1.2 which wasn't necessary for 1.13.2 or later versions. Being > >>>>> helpful, the newer versions made me a spoiled, entitled child and I > >> will > >>>>> repent immediately. > >>>>> > >>>>> Thanks, guys! DevOps are happy they don't have to upgrade the > >> customers > >>>>> to NiFi 1.13.2. (In a way, I'm unhappy about that, but...). > >>>>> > >>>>> Best regards, > >>>>> > >>>>> Russ > >>>>> > >>>>> On 5/7/24 11:53, Mark Payne wrote: > >>>>>> The call to session.putAttribute would throw an Exception because > you > >>>>>> provided an outdated version of the flowFile (did not capturing the > >>>>>> result of calling session.write) > >>>>>> > >>>>>> Now, as NiFi matured, we found that: > >>>>>> (a) for more complex processors that aren’t just a series of > >> sequential > >>>>>> steps it becomes difficult to manage all of that bookkeeping. > >>>>>> (b) it was not intuitive to require this > >>>>>> (c) the ProcessSession already had more or less what it needed in > >> order > >>>>>> to determine what the most up-to-date version of the FlowFile was. > >>>>>> > >>>>>> So we updated the ProcessSession to automatically grab the latest > >>>>>> version of the FlowFile for these methods. But since you’re trying > to > >>>>>> run an old version, you’ll need to make sure that you capture all of > >>>>>> those outputs and always keep track of the most recent version of a > >>>>>> FlowFile. > >>>>> > >>> > >>> > >> > >