Geoff Hello there and great to see you around the apache nifii water cooler again.
It is admittedly a bit hard to follow the logic of your test flow. Have you considered simplifying the tact by using #create flowfiles GenerateFlowFile or Generate Record. With these you can make unstructured text, unstructured byte data, or structured records. # split them up with the appropriate player of your choice SplitText SplitRecord SplitContent # MergeContent in degfragment mode... You might be right there could be a bug. I think you're definitely right there needs to be a healthy number of bins. You need in the worst case as many bins (i think) as there could be outstanding things being actively defragged. Using the mechanism I describe if you could build a flow that breaks and share it with us that would help Thanks! Joe to create a set of made up flowfiles. You could create textThen UnpackContent On Wed, Nov 26, 2025 at 7:48 AM Greene (US), Geoffrey N via dev < [email protected]> wrote: > I sent this to the users group, but I thought I'd try one last time, since > I got no response, and I just got burned by this again. > > Greetings > > NiFi 2.4 user here (I plan to upgrade but have just not gotten to It yet) > > I believe I may have noted an issue with MergeContent in defragment mode > when the max number of bins is too small. > > I recreated it with a test flow. But before I report it as a bug, I would > like someone to validate that my assumptions are correct. > > I've set up a test flow such that has 10,000 empty flow files. > Via update attribute, each of the 10K flow files is assigned a unique > fragment.identifier, and fragment.count is assigned a value of 4 > I then duplicate each flow file 3 times, so I have 40,000 flow files. Via > the duplicate flow file, fragment.index varies then from 0..3. There are > NO OTHER attributes, and there is no content in the flow files. > > I then run this flow slowly, timing it specifically so that 40,000 flow > files sitting right at the input to a single merge content processor. > (Note that in this example, the nifi is standalone, so there are no cloud > issues.) > > My trouble seems related to maximum number of bins. If the max is LESS > THAN 2500, I get a lot of failures, indicating that not all the fragments > are present. > If the count is more than 5000, everything merges FINE. (I haven't > narrowed it down any further than that), and I end up back with the > original 10,000 flow files (as I should) > > Admittedly, the bin size SHOULD be 10,000 for this test case. But from my > reading, its not supposed to work that way. It SHOULD be recycling the > bins as needed. Admittedly, this would be SLOW, but it shouldn't ERROR. > It really doesn't make sense that 5000 worked. Feels arbitrary, given that > 2500 did NOT. > > I noticed this because when I was authoring a new flow, I accidently left > the maximum number of bins to the default value of 5. It had trouble. > > So the ultimate question : is this a bug I should report? Or am I not > understanding something fundamental? > > Geoffrey Greene > ATF / Senior Software Ninjaneer > > >
