NIFI-13288 <https://issues.apache.org/jira/browse/NIFI-13288>

On Thu, May 23, 2024 at 12:46 PM Dan S <dsti...@gmail.com> wrote:

> I will make a ticket for this. Thanks!
>
> On Thu, May 23, 2024 at 12:45 PM Mark Payne <marka...@hotmail.com> wrote:
>
>> Hey Dan,
>>
>> Yessir, it absolutely would! Probably would be good to clean those up.
>>
>> Thanks
>> -Mark
>>
>>
>> > On May 23, 2024, at 12:29 PM, Dan S <dsti...@gmail.com> wrote:
>> >
>> > Mark,
>> > In regards to your last comments
>> >
>> >> Not really related, but on the note of things you may not realize, with
>> >> that code snippet :)
>> >> If you have access to update the processor mentioned here, you should
>> >> avoid using session.putAttribute many times.
>> >> Under the hood, in order to maintain object immutability it has to
>> create
>> >> a new FlowFile object (and a new HashMap of all attributes!)
>> >> for every call to putAttribute. So if there are 100 attributes that
>> match
>> >> that’s potentially a huge amount of garbage getting created.
>> >
>> >
>> > I noticed that some of the split processors SplitJson, SplitXml, and
>> > SplitAvro all have loops to create a new flow file for each split and it
>> > calls putAttribute more than once (in order to populate the split
>> > attributes FRAGMENT_ID, FRAGMENT_INDEX etc)  for each flow file created.
>> > Would this all suffer from "a huge amount of garbage getting created"
>> > since putAttribute is called multiple times for each iteration of the
>> loop?
>> >
>> > On Wed, May 8, 2024 at 5:27 PM Michael Moser <moser...@gmail.com>
>> wrote:
>> >
>> >> Oh yeah, I do love this behavior of ProcessSession.  And thanks for the
>> >> tip, it's easy to forget that there are efficiencies to be gained by
>> using
>> >> different parts of an API.
>> >>
>> >> -- Mke
>> >>
>> >>
>> >> On Wed, May 8, 2024 at 11:04 AM Mark Payne <marka...@hotmail.com>
>> wrote:
>> >>
>> >>> Yeah, that was something that kinda flew under the radar. Definitely
>> >>> improved the API.
>> >>>
>> >>> Not really related, but on the note of things you may not realize,
>> with
>> >>> that code snippet :)
>> >>> If you have access to update the processor mentioned here, you should
>> >>> avoid using session.putAttribute many times.
>> >>> Under the hood, in order to maintain object immutability it has to
>> create
>> >>> a new FlowFile object (and a new HashMap of all attributes!)
>> >>> for every call to putAttribute. So if there are 100 attributes that
>> match
>> >>> that’s potentially a huge amount of garbage getting created. Instead,
>> >>> you could just use:
>> >>>
>> >>> ```
>> >>> final Map<String, String> attributes = new HashMap<>();
>> >>> flowFile.getAttributes().forEach( (key, value) -> {
>> >>>  if (key.startsWith(“foo”)) {
>> >>>    attributes.put(“original-“ + key, value);
>> >>>  }
>> >>> }
>> >>>
>> >>> flowFIle = session.putAllAttributes(flowFile, attributes);
>> >>> ```
>> >>>
>> >>> Thanks
>> >>> -Mark
>> >>>
>> >>>
>> >>>
>> >>>> On May 8, 2024, at 10:30 AM, Michael Moser <moser...@gmail.com>
>> wrote:
>> >>>>
>> >>>> Wow, thanks for this information!  Just last week I saw code that
>> >>> modified
>> >>>> attributes in a stream:
>> >>>>
>> >>>> flowFile.getAttributes().entrySet().stream().filter(e ->
>> >>>> e.getKey().startsWith("foo"))
>> >>>>   .forEach(e -> session.putAttribute(flowFile, "original-" +
>> e.getKey,
>> >>>> e.getValue()));
>> >>>>
>> >>>> and I wondered how that could possibly work since the return value of
>> >>>> session.putAttribute is ignored!  Now I know.
>> >>>>
>> >>>> -- Mike
>> >>>>
>> >>>> On Tue, May 7, 2024 at 3:02 PM Russell Bateman <
>> r...@windofkeltia.com>
>> >>>> wrote:
>> >>>>
>> >>>>> Yes, what you described is what was happening, Mark. I didn't
>> display
>> >>>>> all of the code to the session methods, and I did re-read the
>> >> in-coming
>> >>>>> flowfile for different purposes than I had already read and written
>> >> it.
>> >>>>> So, I wasn't helpful enough. In the end, however, I had forgotten,
>> >>>>> immediately after the call to session.putAllAttributes(), to update
>> >> the
>> >>>>> resulting flowfile for passing to session.transfer(). That solved it
>> >> for
>> >>>>> 1.1.2 which wasn't necessary for 1.13.2 or later versions. Being
>> >>>>> helpful, the newer versions made me a spoiled, entitled child and I
>> >> will
>> >>>>> repent immediately.
>> >>>>>
>> >>>>> Thanks, guys! DevOps are happy they don't have to upgrade the
>> >> customers
>> >>>>> to NiFi 1.13.2. (In a way, I'm unhappy about that, but...).
>> >>>>>
>> >>>>> Best regards,
>> >>>>>
>> >>>>> Russ
>> >>>>>
>> >>>>> On 5/7/24 11:53, Mark Payne wrote:
>> >>>>>> The call to session.putAttribute would throw an Exception because
>> you
>> >>>>>> provided an outdated version of the flowFile (did not capturing the
>> >>>>>> result of calling session.write)
>> >>>>>>
>> >>>>>> Now, as NiFi matured, we found that:
>> >>>>>> (a) for more complex processors that aren’t just a series of
>> >> sequential
>> >>>>>> steps it becomes difficult to manage all of that bookkeeping.
>> >>>>>> (b) it was not intuitive to require this
>> >>>>>> (c) the ProcessSession already had more or less what it needed in
>> >> order
>> >>>>>> to determine what the most up-to-date version of the FlowFile was.
>> >>>>>>
>> >>>>>> So we updated the ProcessSession to automatically grab the latest
>> >>>>>> version of the FlowFile for these methods. But since you’re trying
>> to
>> >>>>>> run an old version, you’ll need to make sure that you capture all
>> of
>> >>>>>> those outputs and always keep track of the most recent version of a
>> >>>>>> FlowFile.
>> >>>>>
>> >>>
>> >>>
>> >>
>>
>>

Reply via email to