Forgot this fun fact: while we're still on 1.14.0 on this particular NiFi instance, it is a forked version which includes a cherry-pick of NIFI-9433. So, it seems likely this is still a separate issue.
Again, we hope to reproduce in a test environment next week. An important step is to determine if this is a load balancer related issue or something else. On Fri, Jun 10, 2022 at 3:13 PM Mark Bean <mark.o.b...@gmail.com> wrote: > Yes, it will be a few weeks at least to get the upgrade into the > environment where we see this occurring and eavluate. Part of the problem > is reproducibility. We haven't yet created a scenario that reliably forces > this situation. That's on the list for Monday though. If we can reliably > reproduce, I'm sure we can test much sooner with 1.16.2 to confirm it's > been addressed - even if we don't yet have 1.16.2 in the target environment. > > Will report back findings when available. > > Thanks, > Mark > > > On Fri, Jun 10, 2022 at 2:53 PM Joe Witt <joe.w...@gmail.com> wrote: > >> Mark >> >> I will be a few weeks before you can evaluate this? >> >> thanks >> >> On Fri, Jun 10, 2022 at 11:03 AM Joe Witt <joe.w...@gmail.com> wrote: >> >> > MarkB >> > >> > That is why MarkP said it was a manifestation. Point is the issue you >> > noted, specifically the behavior you saw here (and before) is believed >> to >> > be addressed in that fix which went into the release 6 months ago and is >> > also in the 1.16.x line. You'll want that and of course the many other >> > improvements to have improved behavior for this scenario. >> > >> > Thanks >> > >> > On Fri, Jun 10, 2022 at 10:59 AM Mark Bean <mark.o.b...@gmail.com> >> wrote: >> > >> >> This is not quite the same issue. It's possible the fix for NIFI-9433 >> may >> >> be related. But, the set of circumstances are definitely different. >> Also, >> >> the observed behavior is different. For example, none of the nodes >> report >> >> " >> >> Cannot create negative queue size". >> >> >> >> I'm trying to track specific FlowFile(s) from one node to another >> during >> >> load balancing. And, I have been unsuccessful. In other words, I have >> not >> >> been able to confirm whether a given FlowFile was successfully >> transferred >> >> or not. Provenance is no longer available for this time period. I know, >> >> not >> >> good answers for diagnosing the issue. >> >> >> >> My real question is what is the expected behavior for FlowFiles that >> are >> >> actively load balancing and the cluster is shutdown? >> >> >> >> We have plans to upgrade as soon as possible, but unfortunately, that >> will >> >> not be for at least a few more weeks due to the need to integrate >> custom >> >> changes into 1.16.2. >> >> >> >> >> >> On Fri, Jun 10, 2022 at 1:31 PM Mark Payne <marka...@hotmail.com> >> wrote: >> >> >> >> > Mark, >> >> > >> >> > This is a manifestation of NIFI-9433 [1] that we fixed a while back. >> >> > Recommend you upgrade your installation. >> >> > >> >> > Thanks >> >> > -Mark >> >> > >> >> > >> >> > [1] https://issues.apache.org/jira/browse/NIFI-9433 >> >> > >> >> > >> >> > On Jun 10, 2022, at 1:16 PM, Mark Bean <mark.o.b...@gmail.com >> <mailto: >> >> > mark.o.b...@gmail.com>> wrote: >> >> > >> >> > We have a situation where several flowfiles have lost their content. >> >> They >> >> > still appear on the graph, but any attempt by a processor to access >> >> content >> >> > results in a NullPointerException. The identified content claim file >> is >> >> in >> >> > fact missing from the file system. >> >> > >> >> > Also, there are ERROR log messages indicating the claimant count is a >> >> > negative value. >> >> > >> >> > o.a.n.c.r.c.StandardResourceClaimManager Decremented claimant count >> for >> >> > StandardResourceClaim[id=1234-567, containter=default, section=890] >> to >> >> -1 >> >> > >> >> > (There are also some with negative values as low as -4.) >> >> > >> >> > Anecdotally, we are suspecting this may have been caused by >> incomplete >> >> > connection load balance. And, if this is the case, it is not clear if >> >> the >> >> > content successfully reached another Node and the FlowFile simply >> didn't >> >> > finish cleaning up, or if content was prematurely dropped. >> >> > >> >> > It should be noted that the cluster was upgraded/restarted at or >> about >> >> the >> >> > time the errors started. Could a shutdown of NiFi cause data loss if >> a >> >> load >> >> > balance was currently in progress? >> >> > >> >> > NiFi 1.14.0 >> >> > >> >> > Thanks, >> >> > Mark >> >> > >> >> > >> >> >> > >> >