[ https://issues.apache.org/activemq/browse/AMQ-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=63515#action_63515 ]
Arthur Naseef commented on AMQ-3028: ------------------------------------ I will test with the update and post the results when complete. With any luck, it'll be done today. > ActiveMQ broker processing slows with consumption from large store > ------------------------------------------------------------------ > > Key: AMQ-3028 > URL: https://issues.apache.org/activemq/browse/AMQ-3028 > Project: ActiveMQ > Issue Type: Bug > Components: Broker > Affects Versions: 5.4.1 > Environment: CentOS 5.5, Sun JDK 1.6.0_21-b06 64 bit, ActiveMQ 5.4.1, > AMD Athlon(tm) II X2 B22, local disk > Reporter: Arthur Naseef > Assignee: Dejan Bosanac > Priority: Critical > Fix For: 5.5.0 > > Attachments: LRUCache.patch > > > In scalability tests, this problem occured. I have tested a workaround that > appears to function. A fix will gladly be submitted - would like some > guidance, though, on the most appropriate solution. > Here's the summary. Many more details are available upon request. > Root cause: > - Believed to be simultaneous access to LRUCache objects which are not > thread-safe (PageFile's pageCache) > Workaround: > - Synchronize the LRUCache on all access methods (get, put, remove) > The symptoms are as follows: > 1. Message rates run fairly-constant until a point in time when they > degrade rather quickly > 2. After a while (about 15 minutes), the message rates drop to the floor - > with large numbers of seconds with 0 records passing > 3. Using VisualVM or JConsole, note that memory use grows continuosuly > 4. When message rates drop to the floor, the VM is spending the vast > majority of its time performing garbage collection > 5. Heap dumps show that LRUCache objects (the pageCache members of > PageFile's) are far exceeding their configured limits. > The default limit was used, 10000. A size of over 170,000 entries was > reached. > 6. No producer flow control occurred (did not see the flow control log > message) > Test scenario used to reproduce: > - Fast producers (limited to <= 1000 msgs/sec) > -- using transactions > -- 10 msg per transaction > -- message content size 177 bytes > - Slow consumers (limited to <= 10 msg/sec) > -- auto-acknowledge mode; not transacted > - 10 Queues > -- 1 producer per queue > -- 1 consumer per queue > - Producers, Consumers, and Broker all running on different systems, and > on the same system (different test runs). > Note that disk space was not an issue - there was always plenty of disk space > available. > One other interesting note - once a large database of records was stored in > KahaDB, only running consumers, this problem still occurred. > This issue sounds like it may be related to 1764, and 2721. The root cause > sounds the same as 2290 - unsynchronized access to LRUCache. > The most straight-forward solution is to modify all LRUCache objects > (org.apache.kahadb.util.LRUCache, org.apache.activemq.util.LRUCache, ...) to > be concurrent. Another is to create concurrent versions (perhaps > ConcurrentLRUCache) and make use of those at least in PageFile.pageCache. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.