[
https://issues.apache.org/activemq/browse/AMQ-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Richard Yarger updated AMQ-1918:
--------------------------------
Attachment: NegativeQueueCursorSupport.java
I have created a unit test that can reproduce the issue.
It takes around 5 min to complete.
I modeled the test off of the CursorSupport test case.
I just added a second queue and more specific memory settings.
I also included tests with different prefetch values.
Lowering prefetch seems to have a direct impact on the issue.
testWithDefaultPrefetch() and testWithDefaultPrefetchFiveConsumers()
are usually the ones to fail.
I am reproducing the issue quite easily with this test case.
So let me know if you cannot.
Thanks.
> AbstractStoreCursor.size gets out of synch with Store size and blocks
> consumers
> -------------------------------------------------------------------------------
>
> Key: AMQ-1918
> URL: https://issues.apache.org/activemq/browse/AMQ-1918
> Project: ActiveMQ
> Issue Type: Bug
> Components: Message Store
> Affects Versions: 5.1.0
> Reporter: Richard Yarger
> Assignee: Rob Davies
> Priority: Critical
> Fix For: 5.3.0
>
> Attachments: activemq.xml, NegativeQueueCursorSupport.java,
> testAMQMessageStore.zip, testdata.zip
>
>
> In version 5.1.0, we are seeing our queue consumers stop consuming for no
> reason.
> We have a staged queue environment and we occasionally see one queue display
> negative pending message counts that hang around -x, rise to -x+n gradually
> and then fall back to -x abruptly. The messages are building up and being
> processed in bunches but its not easy to see because the counts are negative.
> We see this behavior in the messages coming out of the system. Outbound
> messages come out in bunches and are synchronized with the queue pending
> count dropping to -x.
> This issue does not happen ALL of the time. It happens about once a week and
> the only way to fix it is to bounce the broker. It doesn't happen to the same
> queue everytime, so it is not our consuming code.
> Although we don't have a reproducible scenario, we have been able to debug
> the issue in our test environment.
> We traced the problem to the cached store size in the AbstractStoreCursor.
> This value becomes 0 or negative and prevents the AbstractStoreCursor from
> retrieving more messages from the store. (see AbstractStoreCursor.fillBatch()
> )
> We have seen size value go lower than -1000.
> We have also forced it to fix itself by sending in n+1 messages. Once the
> size goes above zero, the cached value is refreshed and things work ok again.
> Unfortunately, during low volume times, it could be hours before n+1 messages
> are received, so our message latency can rise during low volume times.... :(
> I have attached our broker config.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.