Re: possible load balancing issue

Mark Bean Fri, 10 Jun 2022 13:36:58 -0700

Forgot this fun fact: while we're still on 1.14.0 on this particular NiFi
instance, it is a forked version which includes a cherry-pick of NIFI-9433.
So, it seems likely this is still a separate issue.


Again, we hope to reproduce in a test environment next week. An important
step is to determine if this is a load balancer related issue or
something else.


On Fri, Jun 10, 2022 at 3:13 PM Mark Bean <mark.o.b...@gmail.com> wrote:

> Yes, it will be a few weeks at least to get the upgrade into the
> environment where we see this occurring and eavluate. Part of the problem
> is reproducibility. We haven't yet created a scenario that reliably forces
> this situation. That's on the list for Monday though. If we can reliably
> reproduce, I'm sure we can test much sooner with 1.16.2 to confirm it's
> been addressed - even if we don't yet have 1.16.2 in the target environment.
>
> Will report back findings when available.
>
> Thanks,
> Mark
>
>
> On Fri, Jun 10, 2022 at 2:53 PM Joe Witt <joe.w...@gmail.com> wrote:
>
>> Mark
>>
>> I will be a few weeks before you can evaluate this?
>>
>> thanks
>>
>> On Fri, Jun 10, 2022 at 11:03 AM Joe Witt <joe.w...@gmail.com> wrote:
>>
>> > MarkB
>> >
>> > That is why MarkP said it was a manifestation.  Point is the issue you
>> > noted, specifically the behavior you saw here (and before) is believed
>> to
>> > be addressed in that fix which went into the release 6 months ago and is
>> > also in the 1.16.x line.  You'll want that and of course the many other
>> > improvements to have improved behavior for this scenario.
>> >
>> > Thanks
>> >
>> > On Fri, Jun 10, 2022 at 10:59 AM Mark Bean <mark.o.b...@gmail.com>
>> wrote:
>> >
>> >> This is not quite the same issue. It's possible the fix for NIFI-9433
>> may
>> >> be related. But, the set of circumstances are definitely different.
>> Also,
>> >> the observed behavior is different. For example, none of the nodes
>> report
>> >> "
>> >> Cannot create negative queue size".
>> >>
>> >> I'm trying to track specific FlowFile(s) from one node to another
>> during
>> >> load balancing. And, I have been unsuccessful. In other words, I have
>> not
>> >> been able to confirm whether a given FlowFile was successfully
>> transferred
>> >> or not. Provenance is no longer available for this time period. I know,
>> >> not
>> >> good answers for diagnosing the issue.
>> >>
>> >> My real question is what is the expected behavior for FlowFiles that
>> are
>> >> actively load balancing and the cluster is shutdown?
>> >>
>> >> We have plans to upgrade as soon as possible, but unfortunately, that
>> will
>> >> not be for at least a few more weeks due to the need to integrate
>> custom
>> >> changes into 1.16.2.
>> >>
>> >>
>> >> On Fri, Jun 10, 2022 at 1:31 PM Mark Payne <marka...@hotmail.com>
>> wrote:
>> >>
>> >> > Mark,
>> >> >
>> >> > This is a manifestation of NIFI-9433 [1] that we fixed a while back.
>> >> > Recommend you upgrade your installation.
>> >> >
>> >> > Thanks
>> >> > -Mark
>> >> >
>> >> >
>> >> > [1] https://issues.apache.org/jira/browse/NIFI-9433
>> >> >
>> >> >
>> >> > On Jun 10, 2022, at 1:16 PM, Mark Bean <mark.o.b...@gmail.com
>> <mailto:
>> >> > mark.o.b...@gmail.com>> wrote:
>> >> >
>> >> > We have a situation where several flowfiles have lost their content.
>> >> They
>> >> > still appear on the graph, but any attempt by a processor to access
>> >> content
>> >> > results in a NullPointerException. The identified content claim file
>> is
>> >> in
>> >> > fact missing from the file system.
>> >> >
>> >> > Also, there are ERROR log messages indicating the claimant count is a
>> >> > negative value.
>> >> >
>> >> > o.a.n.c.r.c.StandardResourceClaimManager Decremented claimant count
>> for
>> >> > StandardResourceClaim[id=1234-567, containter=default, section=890]
>> to
>> >> -1
>> >> >
>> >> > (There are also some with negative values as low as -4.)
>> >> >
>> >> > Anecdotally, we are suspecting this may have been caused by
>> incomplete
>> >> > connection load balance. And, if this is the case, it is not clear if
>> >> the
>> >> > content successfully reached another Node and the FlowFile simply
>> didn't
>> >> > finish cleaning up, or if content was prematurely dropped.
>> >> >
>> >> > It should be noted that the cluster was upgraded/restarted at or
>> about
>> >> the
>> >> > time the errors started. Could a shutdown of NiFi cause data loss if
>> a
>> >> load
>> >> > balance was currently in progress?
>> >> >
>> >> > NiFi 1.14.0
>> >> >
>> >> > Thanks,
>> >> > Mark
>> >> >
>> >> >
>> >>
>> >
>>
>

Re: possible load balancing issue

Reply via email to