Hey Dan, Yessir, it absolutely would! Probably would be good to clean those up.
Thanks -Mark > On May 23, 2024, at 12:29 PM, Dan S <dsti...@gmail.com> wrote: > > Mark, > In regards to your last comments > >> Not really related, but on the note of things you may not realize, with >> that code snippet :) >> If you have access to update the processor mentioned here, you should >> avoid using session.putAttribute many times. >> Under the hood, in order to maintain object immutability it has to create >> a new FlowFile object (and a new HashMap of all attributes!) >> for every call to putAttribute. So if there are 100 attributes that match >> that’s potentially a huge amount of garbage getting created. > > > I noticed that some of the split processors SplitJson, SplitXml, and > SplitAvro all have loops to create a new flow file for each split and it > calls putAttribute more than once (in order to populate the split > attributes FRAGMENT_ID, FRAGMENT_INDEX etc) for each flow file created. > Would this all suffer from "a huge amount of garbage getting created" > since putAttribute is called multiple times for each iteration of the loop? > > On Wed, May 8, 2024 at 5:27 PM Michael Moser <moser...@gmail.com> wrote: > >> Oh yeah, I do love this behavior of ProcessSession. And thanks for the >> tip, it's easy to forget that there are efficiencies to be gained by using >> different parts of an API. >> >> -- Mke >> >> >> On Wed, May 8, 2024 at 11:04 AM Mark Payne <marka...@hotmail.com> wrote: >> >>> Yeah, that was something that kinda flew under the radar. Definitely >>> improved the API. >>> >>> Not really related, but on the note of things you may not realize, with >>> that code snippet :) >>> If you have access to update the processor mentioned here, you should >>> avoid using session.putAttribute many times. >>> Under the hood, in order to maintain object immutability it has to create >>> a new FlowFile object (and a new HashMap of all attributes!) >>> for every call to putAttribute. So if there are 100 attributes that match >>> that’s potentially a huge amount of garbage getting created. Instead, >>> you could just use: >>> >>> ``` >>> final Map<String, String> attributes = new HashMap<>(); >>> flowFile.getAttributes().forEach( (key, value) -> { >>> if (key.startsWith(“foo”)) { >>> attributes.put(“original-“ + key, value); >>> } >>> } >>> >>> flowFIle = session.putAllAttributes(flowFile, attributes); >>> ``` >>> >>> Thanks >>> -Mark >>> >>> >>> >>>> On May 8, 2024, at 10:30 AM, Michael Moser <moser...@gmail.com> wrote: >>>> >>>> Wow, thanks for this information! Just last week I saw code that >>> modified >>>> attributes in a stream: >>>> >>>> flowFile.getAttributes().entrySet().stream().filter(e -> >>>> e.getKey().startsWith("foo")) >>>> .forEach(e -> session.putAttribute(flowFile, "original-" + e.getKey, >>>> e.getValue())); >>>> >>>> and I wondered how that could possibly work since the return value of >>>> session.putAttribute is ignored! Now I know. >>>> >>>> -- Mike >>>> >>>> On Tue, May 7, 2024 at 3:02 PM Russell Bateman <r...@windofkeltia.com> >>>> wrote: >>>> >>>>> Yes, what you described is what was happening, Mark. I didn't display >>>>> all of the code to the session methods, and I did re-read the >> in-coming >>>>> flowfile for different purposes than I had already read and written >> it. >>>>> So, I wasn't helpful enough. In the end, however, I had forgotten, >>>>> immediately after the call to session.putAllAttributes(), to update >> the >>>>> resulting flowfile for passing to session.transfer(). That solved it >> for >>>>> 1.1.2 which wasn't necessary for 1.13.2 or later versions. Being >>>>> helpful, the newer versions made me a spoiled, entitled child and I >> will >>>>> repent immediately. >>>>> >>>>> Thanks, guys! DevOps are happy they don't have to upgrade the >> customers >>>>> to NiFi 1.13.2. (In a way, I'm unhappy about that, but...). >>>>> >>>>> Best regards, >>>>> >>>>> Russ >>>>> >>>>> On 5/7/24 11:53, Mark Payne wrote: >>>>>> The call to session.putAttribute would throw an Exception because you >>>>>> provided an outdated version of the flowFile (did not capturing the >>>>>> result of calling session.write) >>>>>> >>>>>> Now, as NiFi matured, we found that: >>>>>> (a) for more complex processors that aren’t just a series of >> sequential >>>>>> steps it becomes difficult to manage all of that bookkeeping. >>>>>> (b) it was not intuitive to require this >>>>>> (c) the ProcessSession already had more or less what it needed in >> order >>>>>> to determine what the most up-to-date version of the FlowFile was. >>>>>> >>>>>> So we updated the ProcessSession to automatically grab the latest >>>>>> version of the FlowFile for these methods. But since you’re trying to >>>>>> run an old version, you’ll need to make sure that you capture all of >>>>>> those outputs and always keep track of the most recent version of a >>>>>> FlowFile. >>>>> >>> >>> >>