[
https://issues.apache.org/jira/browse/ARTEMIS-4527?focusedWorklogId=900508&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-900508
]
ASF GitHub Bot logged work on ARTEMIS-4527:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 18/Jan/24 20:45
Start Date: 18/Jan/24 20:45
Worklog Time Spent: 10m
Work Description: AntonRoskvist commented on PR #4705:
URL:
https://github.com/apache/activemq-artemis/pull/4705#issuecomment-1899178681
Hello @jbertram
I had some time to look further into this and came up with another fix and
reproducer/test which seem to work better.
The main issue is that in some very race conditions the broker will send out
it's notification for an added consumer before sending the binding_added
notification for the queue the consumer is bound to.
From my testing this seems to happen when `postOfficeImpl#addBinding()` has
_just_ added the actual binding to addressManager, but not yet called
`managementService.sendNotification()` so that the unsynchronized
`postOfficeImpl#getBinding()` will return it for
ServerSessionImpl#createConsumer() and enabling it to lock the
`managementService` before postOffice is able to.
I have not been able to point out _exactly_ what conditions has to be met
for this to occur, but the new test included in this change works reliably to
triggering the first issue in the chain leading up the "redistributor race" i.e
getting the clusters remoteBindings out of sync with regards to their consumer
count.
I'm removing the "Draft" status as of now but please let me know if anything
looks off about these changes.
Issue Time Tracking
-------------------
Worklog Id: (was: 900508)
Time Spent: 1h (was: 50m)
> Redistributor race when consumerCount reaches 0 in cluster
> ----------------------------------------------------------
>
> Key: ARTEMIS-4527
> URL: https://issues.apache.org/jira/browse/ARTEMIS-4527
> Project: ActiveMQ Artemis
> Issue Type: Bug
> Reporter: Anton Roskvist
> Priority: Major
> Time Spent: 1h
> Remaining Estimate: 0h
>
> This is a very rare bug caused by cluster notifications arriving in the wrong
> order in some very specific circumstances
--
This message was sent by Atlassian Jira
(v8.20.10#820010)