[
https://issues.apache.org/jira/browse/AMQ-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14236923#comment-14236923
]
Sam hendley commented on AMQ-4512:
----------------------------------
We are still pinned to 5.8.0 because of some other dependencies (camel mostly)
so despite me knowing about this issue for nearly 2 years I still can't ship
with producerWindow enabled. This has become a real problem because the "raw
socket blocking" is preventing any sort of clean shutdown.
Do you think I would be about backporting these fixes and the related one for
AMQ-4361 to 5.8.0? I copied over the Usage class and it's sub classes from
5.9.0. No other code changes were made and all of the unit tests in activemq
passed. Any ideas how I would go about verifying the safety of these changes?
> MemoryUsage waitForSpace in inconsistent state
> ----------------------------------------------
>
> Key: AMQ-4512
> URL: https://issues.apache.org/jira/browse/AMQ-4512
> Project: ActiveMQ
> Issue Type: Bug
> Components: Broker
> Affects Versions: 5.6.0, 5.7.0
> Reporter: Sam hendley
> Assignee: Timothy Bish
> Fix For: 5.9.0
>
> Attachments: AMQ4512Patch.txt, AMQ4512Patch.txt, MemUsage1.PNG,
> MemUsage2.PNG, MemoryUsageTest.java, QueueStats.PNG
>
>
> There is a race condition in MemoryUsage which makes it possible for it to be
> left in an inconsistent state and thereby hang any clients in waitForSpace().
> The core issue is in the following block of code:
> {code:java}
> public void decreaseUsage(long value) {
> if (value == 0) {
> return;
> }
> int percentUsage;
> synchronized (usageMutex) {
> usage -= value;
> percentUsage = caclPercentUsage();
> }
> setPercentUsage(percentUsage);
> if (parent != null) {
> parent.decreaseUsage(value);
> }
> }
> {code}
> The bug occurs when multiple threads are calling increment/decrement at same
> time. Since the field "usage" is protected with the usageMutex each
> writer/reader will see the correct and current value for usage and calculate
> the right value for percentUsage at that instant. "setPercentUsage" is also
> protected by the same usageMutex so we resyncronize on usageMutex to set the
> percentUsage field as well. The issue is that threads may enter the
> setPercentUsage synchronized block in a different order than they entered the
> percentUsage "calculating" block. Since percentUsage is carried between the
> two blocks, a reordering of threads can allow the wrong final percentUsage
> value to be set.
> Possible threading (imagine usage starts at 0 and limit is 100).
> Thread #1 - usage += 150; percentUsage = 150;
> Thread #1 - suspended before called setPercentUsage
> Thread #2 - usage -= 150; percentUsage = 0;
> Thread #2 - setPercentUsage(0);
> Thread #1 - resumed, can now call setPercentUsage
> Thread #1 - setPercentUsage(150);
> Final value = 150
> This same pattern of synchronizing to calculate the percentUsage and then
> setting the value later is repeated in all of the Usage objects I looked at.
> My guess it was written this way to avoid holding locks while making calls
> out to "untrusted code" but it is a very dangerous way to do the
> calculations. The most surprising thing is the locks are still being
> explicitly held while calling fireEvent anyways.
> I have attached two screenshots taken using a debugger of two threads that
> have both been stalled for multiple minutes on "waitForSpace" trying to
> publish to the same queue. Notice they both have a "usage" of 0 but a
> "percentUsage" > 100, this should be impossible. To get the system into this
> state I was using JmsTemplate and a CachingConnectionFactory to publish on 8
> threads and a single DefaultMessageListenerContainer who is pulling the
> messages off as fast as possible. The test publishes 100000 measurements and
> around ~75% of the time atleast a few producers end up stalled in
> waitForSpace() even though the queue is ready for more messages. I can also
> reproduce these results using JmsTemplate and PooledConnectionFactory so I
> don't believe it's an issue in the pooling implementations.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)