NIFI-13288 <https://issues.apache.org/jira/browse/NIFI-13288>
On Thu, May 23, 2024 at 12:46 PM Dan S <dsti...@gmail.com> wrote: > I will make a ticket for this. Thanks! > > On Thu, May 23, 2024 at 12:45 PM Mark Payne <marka...@hotmail.com> wrote: > >> Hey Dan, >> >> Yessir, it absolutely would! Probably would be good to clean those up. >> >> Thanks >> -Mark >> >> >> > On May 23, 2024, at 12:29 PM, Dan S <dsti...@gmail.com> wrote: >> > >> > Mark, >> > In regards to your last comments >> > >> >> Not really related, but on the note of things you may not realize, with >> >> that code snippet :) >> >> If you have access to update the processor mentioned here, you should >> >> avoid using session.putAttribute many times. >> >> Under the hood, in order to maintain object immutability it has to >> create >> >> a new FlowFile object (and a new HashMap of all attributes!) >> >> for every call to putAttribute. So if there are 100 attributes that >> match >> >> that’s potentially a huge amount of garbage getting created. >> > >> > >> > I noticed that some of the split processors SplitJson, SplitXml, and >> > SplitAvro all have loops to create a new flow file for each split and it >> > calls putAttribute more than once (in order to populate the split >> > attributes FRAGMENT_ID, FRAGMENT_INDEX etc) for each flow file created. >> > Would this all suffer from "a huge amount of garbage getting created" >> > since putAttribute is called multiple times for each iteration of the >> loop? >> > >> > On Wed, May 8, 2024 at 5:27 PM Michael Moser <moser...@gmail.com> >> wrote: >> > >> >> Oh yeah, I do love this behavior of ProcessSession. And thanks for the >> >> tip, it's easy to forget that there are efficiencies to be gained by >> using >> >> different parts of an API. >> >> >> >> -- Mke >> >> >> >> >> >> On Wed, May 8, 2024 at 11:04 AM Mark Payne <marka...@hotmail.com> >> wrote: >> >> >> >>> Yeah, that was something that kinda flew under the radar. Definitely >> >>> improved the API. >> >>> >> >>> Not really related, but on the note of things you may not realize, >> with >> >>> that code snippet :) >> >>> If you have access to update the processor mentioned here, you should >> >>> avoid using session.putAttribute many times. >> >>> Under the hood, in order to maintain object immutability it has to >> create >> >>> a new FlowFile object (and a new HashMap of all attributes!) >> >>> for every call to putAttribute. So if there are 100 attributes that >> match >> >>> that’s potentially a huge amount of garbage getting created. Instead, >> >>> you could just use: >> >>> >> >>> ``` >> >>> final Map<String, String> attributes = new HashMap<>(); >> >>> flowFile.getAttributes().forEach( (key, value) -> { >> >>> if (key.startsWith(“foo”)) { >> >>> attributes.put(“original-“ + key, value); >> >>> } >> >>> } >> >>> >> >>> flowFIle = session.putAllAttributes(flowFile, attributes); >> >>> ``` >> >>> >> >>> Thanks >> >>> -Mark >> >>> >> >>> >> >>> >> >>>> On May 8, 2024, at 10:30 AM, Michael Moser <moser...@gmail.com> >> wrote: >> >>>> >> >>>> Wow, thanks for this information! Just last week I saw code that >> >>> modified >> >>>> attributes in a stream: >> >>>> >> >>>> flowFile.getAttributes().entrySet().stream().filter(e -> >> >>>> e.getKey().startsWith("foo")) >> >>>> .forEach(e -> session.putAttribute(flowFile, "original-" + >> e.getKey, >> >>>> e.getValue())); >> >>>> >> >>>> and I wondered how that could possibly work since the return value of >> >>>> session.putAttribute is ignored! Now I know. >> >>>> >> >>>> -- Mike >> >>>> >> >>>> On Tue, May 7, 2024 at 3:02 PM Russell Bateman < >> r...@windofkeltia.com> >> >>>> wrote: >> >>>> >> >>>>> Yes, what you described is what was happening, Mark. I didn't >> display >> >>>>> all of the code to the session methods, and I did re-read the >> >> in-coming >> >>>>> flowfile for different purposes than I had already read and written >> >> it. >> >>>>> So, I wasn't helpful enough. In the end, however, I had forgotten, >> >>>>> immediately after the call to session.putAllAttributes(), to update >> >> the >> >>>>> resulting flowfile for passing to session.transfer(). That solved it >> >> for >> >>>>> 1.1.2 which wasn't necessary for 1.13.2 or later versions. Being >> >>>>> helpful, the newer versions made me a spoiled, entitled child and I >> >> will >> >>>>> repent immediately. >> >>>>> >> >>>>> Thanks, guys! DevOps are happy they don't have to upgrade the >> >> customers >> >>>>> to NiFi 1.13.2. (In a way, I'm unhappy about that, but...). >> >>>>> >> >>>>> Best regards, >> >>>>> >> >>>>> Russ >> >>>>> >> >>>>> On 5/7/24 11:53, Mark Payne wrote: >> >>>>>> The call to session.putAttribute would throw an Exception because >> you >> >>>>>> provided an outdated version of the flowFile (did not capturing the >> >>>>>> result of calling session.write) >> >>>>>> >> >>>>>> Now, as NiFi matured, we found that: >> >>>>>> (a) for more complex processors that aren’t just a series of >> >> sequential >> >>>>>> steps it becomes difficult to manage all of that bookkeeping. >> >>>>>> (b) it was not intuitive to require this >> >>>>>> (c) the ProcessSession already had more or less what it needed in >> >> order >> >>>>>> to determine what the most up-to-date version of the FlowFile was. >> >>>>>> >> >>>>>> So we updated the ProcessSession to automatically grab the latest >> >>>>>> version of the FlowFile for these methods. But since you’re trying >> to >> >>>>>> run an old version, you’ll need to make sure that you capture all >> of >> >>>>>> those outputs and always keep track of the most recent version of a >> >>>>>> FlowFile. >> >>>>> >> >>> >> >>> >> >> >> >>