Geoff

Hello there and great to see you around the apache nifii water cooler again.

It is admittedly a bit hard to follow the logic of your test flow.  Have
you considered simplifying the tact by using

#create flowfiles
GenerateFlowFile or Generate Record.  With these you can make unstructured
text, unstructured byte data, or structured records.

# split them up with the appropriate player of your choice
SplitText
SplitRecord
SplitContent

# MergeContent in degfragment mode...

You might be right there could be a bug.  I think you're definitely right
there needs to be a healthy number of bins.  You need in the worst case as
many bins (i think) as there could be outstanding things being actively
defragged.

Using the mechanism I describe if you could build a flow that breaks and
share it with us that would help

Thanks!
Joe


to create a set of made up flowfiles.  You could create textThen
UnpackContent

On Wed, Nov 26, 2025 at 7:48 AM Greene (US), Geoffrey N via dev <
[email protected]> wrote:

> I sent this to the users group, but I thought I'd try one last time, since
> I got no response, and I just got burned by this again.
>
> Greetings
>
> NiFi 2.4 user here (I plan to upgrade but have just not gotten to It yet)
>
> I believe I may have noted an issue with MergeContent in defragment mode
> when the max number of bins is too small.
>
> I recreated it with a test flow. But before I report it as a bug, I would
> like someone to validate that my assumptions are correct.
>
> I've set up a test flow such that has 10,000 empty flow files.
> Via update attribute, each of the 10K flow files is assigned a unique
> fragment.identifier, and fragment.count is assigned a value of 4
> I then duplicate each flow file 3 times, so I have 40,000 flow files.  Via
> the duplicate flow file,  fragment.index varies then from 0..3.  There are
> NO OTHER attributes, and there is no content in the flow files.
>
> I then run this flow slowly, timing it specifically so that 40,000 flow
> files sitting right at the input  to a single merge content processor.
> (Note that in this example, the nifi is standalone, so there are no cloud
> issues.)
>
> My trouble seems related to maximum number of bins.  If the max is LESS
> THAN 2500, I get a lot of failures, indicating that not all the fragments
> are present.
> If the count is more than 5000, everything merges FINE. (I haven't
> narrowed it down any further than that), and I end up back with the
> original 10,000 flow files (as I should)
>
> Admittedly, the bin size SHOULD be 10,000 for this test case.  But from my
> reading, its not supposed to work that way.  It SHOULD be recycling the
> bins as needed.  Admittedly, this would be SLOW, but it shouldn't ERROR.
> It really doesn't make sense that 5000 worked.  Feels arbitrary, given that
> 2500 did NOT.
>
> I noticed this because when I was authoring a new flow, I accidently left
> the maximum number of bins to the default value of 5. It had trouble.
>
> So the ultimate question : is this a bug I should report? Or am I not
> understanding something fundamental?
>
> Geoffrey Greene
> ATF / Senior Software Ninjaneer
>
>
>

Reply via email to