I sent this to the users group, but I thought I'd try one last time, since I got no response, and I just got burned by this again.
Greetings NiFi 2.4 user here (I plan to upgrade but have just not gotten to It yet) I believe I may have noted an issue with MergeContent in defragment mode when the max number of bins is too small. I recreated it with a test flow. But before I report it as a bug, I would like someone to validate that my assumptions are correct. I've set up a test flow such that has 10,000 empty flow files. Via update attribute, each of the 10K flow files is assigned a unique fragment.identifier, and fragment.count is assigned a value of 4 I then duplicate each flow file 3 times, so I have 40,000 flow files. Via the duplicate flow file, fragment.index varies then from 0..3. There are NO OTHER attributes, and there is no content in the flow files. I then run this flow slowly, timing it specifically so that 40,000 flow files sitting right at the input to a single merge content processor. (Note that in this example, the nifi is standalone, so there are no cloud issues.) My trouble seems related to maximum number of bins. If the max is LESS THAN 2500, I get a lot of failures, indicating that not all the fragments are present. If the count is more than 5000, everything merges FINE. (I haven't narrowed it down any further than that), and I end up back with the original 10,000 flow files (as I should) Admittedly, the bin size SHOULD be 10,000 for this test case. But from my reading, its not supposed to work that way. It SHOULD be recycling the bins as needed. Admittedly, this would be SLOW, but it shouldn't ERROR. It really doesn't make sense that 5000 worked. Feels arbitrary, given that 2500 did NOT. I noticed this because when I was authoring a new flow, I accidently left the maximum number of bins to the default value of 5. It had trouble. So the ultimate question : is this a bug I should report? Or am I not understanding something fundamental? Geoffrey Greene ATF / Senior Software Ninjaneer
