[ https://issues.apache.org/jira/browse/QPID-8681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tomas Vavricka updated QPID-8681: --------------------------------- Summary: [Broker-J] Addressing lock contention in Sorted Queues under high load by optimizing property fetching (was: Addressing lock contention in Sorted Queues under high load by optimizing property fetching) > [Broker-J] Addressing lock contention in Sorted Queues under high load by > optimizing property fetching > ------------------------------------------------------------------------------------------------------ > > Key: QPID-8681 > URL: https://issues.apache.org/jira/browse/QPID-8681 > Project: Qpid > Issue Type: Improvement > Components: Broker-J > Affects Versions: qpid-java-broker-7.0.9, qpid-java-broker-9.2.0 > Reporter: Sudheer Sana > Assignee: Robert Godfrey > Priority: Major > Labels: contention, performance > Fix For: qpid-java-broker-9.2.1 > > Attachments: image-2025-01-27-22-41-16-668.png, > image-2025-01-27-22-42-13-739.png > > Original Estimate: 6h > Remaining Estimate: 6h > > *Summary:* > Apache Qpid Broker J provides Sorted Queues, which allow users to implement a > message re-enqueue with delay feature. This feature is crucial in scenarios > where messages dequeued from regular queues cannot be processed immediately > due to certain preconditions (e.g., available resources, concurrency limits). > These messages are re-enqueued to Sorted Queues, where they are sorted based > on their delay expiry time. Periodic jobs then check these Sorted Queues for > expired messages and move them back to the regular queues for processing. > Under high load conditions (e.g., a re-enqueue rate of ~7,000 messages per > second), we observed that the broker experiences contention issues, causing > it to stop responding to REST API calls (which time out if REST timeouts are > set on the client side otherwise just hung around). These APIs are used to > periodically fetch queue statistics. > *Analysis:* > By analyzing Java Flight Recorder (JFR) data, we identified the root cause of > the contention: > # REST API calls to retrieve queue depths required querying some specific > predefined properties from the broker. > # However, for each requested property, the broker was inadvertently > fetching all 26 properties, resulting in repeated and excessive lock > acquisition attempts on the Sorted Queue data structure. > # Among the requested properties, the *oldestMessageAge* property (used for > delay queues) significantly contributed to the contention by increasing the > number of lock requests. > # On the application side, querying for the *oldestMessageAge* property is > unnecessary when dealing with delay queues, so avoiding this query will > further mitigate the contention issue. > *Stack Trace:* > The following stack trace illustrates the contention observed during the > issue: > {code:java} > at > org.apache.qpid.server.queue.SortedQueueEntryList.next(SortedQueueEntryList.java:292) > at > org.apache.qpid.server.queue.SortedQueueEntryList$QueueEntryIteratorImpl.atTail(SortedQueueEntryList.java:686) > at > org.apache.qpid.server.queue.SortedQueueEntryList$QueueEntryIteratorImpl.advance(SortedQueueEntryList.java:698) > at > org.apache.qpid.server.queue.SortedQueueEntryList.getOldestEntry(SortedQueueEntryList.java:350) > at > org.apache.qpid.server.queue.AbstractQueue.getOldestMessageArrivalTime(AbstractQueue.java:1518) > at > org.apache.qpid.server.queue.AbstractQueue.getOldestMessageAge(AbstractQueue.java:1546) > at jdk.internal.reflect.GeneratedMethodAccessor167.invoke(Unknown > Source:-1) at > jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:566) at > org.apache.qpid.server.model.ConfiguredObjectMethodAttributeOrStatistic.getValue(ConfiguredObjectMethodAttributeOrStatistic.java:68) > at > org.apache.qpid.server.model.ConfiguredObjectMethodStatistic.getValue(ConfiguredObjectMethodStatistic.java:26) > at > org.apache.qpid.server.model.AbstractConfiguredObject.getStatistics(AbstractConfiguredObject.java:3181) > at > org.apache.qpid.server.queue.SortedQueueImplWithAccessChecking.getStatistics(SortedQueueImplWithAccessChecking.java:42) > at > org.apache.qpid.server.model.AbstractConfiguredObject.getStatistics(AbstractConfiguredObject.java:3168) > at > org.apache.qpid.server.management.plugin.servlet.query.ConfiguredObjectExpressionFactory$ConfiguredObjectPropertyExpression.getValue(ConfiguredObjectExpressionFactory.java:312) > at > org.apache.qpid.server.management.plugin.servlet.query.ConfiguredObjectExpressionFactory$ConfiguredObjectPropertyExpression.evaluate(ConfiguredObjectExpressionFactory.java:285) > at > org.apache.qpid.server.management.plugin.servlet.query.ConfiguredObjectExpressionFactory$ConfiguredObjectPropertyExpression.evaluate(ConfiguredObjectExpressionFactory.java:272) > at > org.apache.qpid.server.filter.ComparisonExpression.evaluate(ComparisonExpression.java:388) > at > org.apache.qpid.server.filter.ComparisonExpression.matches(ComparisonExpression.java:580) > at > org.apache.qpid.server.management.plugin.servlet.query.ConfiguredObjectQuery.filterObjects(ConfiguredObjectQuery.java:210) > at > org.apache.qpid.server.management.plugin.servlet.query.ConfiguredObjectQuery.<init>(ConfiguredObjectQuery.java:86) > at > org.apache.qpid.server.management.plugin.servlet.rest.QueryServlet.performQuery(QueryServlet.java:93) > at > org.apache.qpid.server.management.plugin.servlet.rest.QueryServlet.doGet(QueryServlet.java:56) > at > org.apache.qpid.server.management.plugin.servlet.rest.AbstractServlet.doGet(AbstractServlet.java:128) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at > javax.servlet.http.HttpServlet.service(HttpServlet.java:790) {code} > * Java Flight Recorder (JFR) data > !image-2025-01-27-22-41-16-668.png! > *Steps to Reproduce:* > # Configure a Qpid Broker with Sorted Queues. > # Enable REST APIs for queue statistics retrieval. > # Simulate high load by re-enqueuing ~7,000 messages per second to the > Sorted Queues. > # Monitor broker performance and REST API response times. > *Expected Behavior:* The broker should handle high re-enqueue rates without > contention issues, and REST API calls for queue statistics should not time > out. > *Actual Behavior:* Under high load conditions, the broker experiences > contention on the Sorted Queue data structure, leading to REST API timeouts. > *Proposed Solution:* > To address this issue, we propose optimizing the property fetching mechanism > in the Qpid Broker in ConfiguredObjectExpressionFactory.java: > # Modify the broker code (ConfiguredObjectExpressionFactory) to retrieve > only the specifically requested properties rather than all 26 properties. > !image-2025-01-27-22-42-13-739.png! > # Avoid querying the oldestMessageAge property on the application side when > dealing with delay queues, as it is not required for processing. > # These optimizations will reduce the load on the Sorted Queue's locking > mechanism and prevent redundant data fetches, thereby improving the broker's > responsiveness under high load conditions. > We have tested the proposed changes in our environment and observed > significant performance improvements, including reduced lock contention and > faster REST API responses under high load. > > _*(Originally opened by Ram Mantripragada, but Reporter updated to Sudheer > Sana, following discussion below)*_ -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org