[
https://issues.apache.org/jira/browse/AMQ-6115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15096370#comment-15096370
]
Klaus Pittig commented on AMQ-6115:
-----------------------------------
Just to complete the known information used for the corresponding mailing list
thread:
http://activemq.2283324.n4.nabble.com/How-to-avoid-blocking-of-queue-browsing-after-ActiveMQ-checkpoint-call-td4705696.html
Tim Bain:
{quote}
I believe you are correct: browsing a persistent queue uses bytes from the
memory store, because those bytes must be read from the persistence store
into the memory store before they can be handed off to browsers or
consumers. If all available bytes in the memory store are already in use,
the messages can't be paged into the memory store, and so the operation
that required them to be paged in will hang/fail.
You can work around the problem by increasing your memory store size via
trial-and-error until the problem goes away. Note that the broker itself
needs some amount of memory, so you can't give the whole heap over to the
memory store or you'll risk getting OOMs, which means you may need to
increase the heap size as well. You can estimate how much memory the
broker needs aside from the memory store by subtracting the bytes used for
the memory store (539 MB) from the total heap bytes used as measured via
JConsole or similar tools. I'd double (or more) that number to be safe, if
it was me; the last thing I want to deal with in a production application
(ActiveMQ or anything else) is running out of memory because I tried to cut
the memory limits too close just to save a little RAM.
All of that is how to work around the fact that before you try to browse
your queue, something else has already consumed all available bytes in the
memory store. If you want to dig into why that's happening, we'd need to
try to figure out what those bytes are being used for and whether it's
possible to change configuration values to reduce the usage so it fits into
your current limit. There will definitely be more effort required than
simply increasing the memory limit (and max heap size), but we can try if
you're not able to increase the limits enough to fix the problem.
If you want to go down that path, one thread to pull on is your observation
that you "can browse/consume some Queues _until_ the #checkpoint call
after 30 seconds." I assume from your reference to checkpointing that
you're using KahaDB as your persistence store. Can you post the KahaDB
portion of your config?
Your statements here and in your StackOverflow post (
http://stackoverflow.com/questions/34679854/how-to-avoid-blocking-of-queue-browsing-after-activemq-checkpoint-call)
indicate that you think that the problem is that memory isn't getting
garbage collected after the operation that needed it (i.e. the checkpoint)
completes, but it's also possible that the checkpoint operation isn't
completing because it can't get enough messages read into the memory
store. Have you confirmed via the thread dump that there is not a
checkpoint operation still in progress? Also, how large are your journal
files that are getting checkpointed? If they're large enough that all
messages for one file won't fit into the memory store, you might be able to
prevent the problem by using smaller files.
{quote}
a.) Regarding your last answer (thanks for your effort by the way):
I'm aware of the relation between the heap and the systemUsage memoryLimit and
we make sure that there are no illogical settings.
The primary requirement is to have a stable system running 'forever' w/o any
memory issues at any time independent from the load/throughput.
No one really wants to deal with memory settings on the edge of limits.
You're right: the memory is completely consumed. And I can't guarantee the
checkpoint/cleanup to be finished completely, so the system can be stalled
without giving GC a chance to release some memory.
It's the expiry check causing this. The persistent stores themselves seem to be
managed as expected (no issues, no inconsistency, no loss);
our situation is independent of the storage (reproducable for leveldb and
kahadb). For KahaDB we use 16mb for journal files since years (helps to save a
huge amount of space required for pending messages not consumed for some days
due to offline situations on client side).
Anyway, here is our current configuration you requested:
{code:xml}
<persistenceAdapter>
<kahaDB directory="${activemq.base}/data/kahadb" enableIndexWriteAsync="true"
journalMaxFileLength="16mb" indexWriteBatchSize="10000" indexCacheSize="10000"
/>
<!--
<levelDB directory="${activemq.base}/data/leveldb" logSize="33554432" />
-->
</persistenceAdapter>
{code}
b.) Some proposal concerning AMQ-6115:
In my point of view, it's worth to discuss the one and only memoryLimit
parameter used for both the regular browse/consume threads and the
checkpoint/cleanup threads.
There should always be enough space to browse/consume any queue at least with
prefetch 1 resp. one of the next pending messages.
Maybe - in this case - 2 well-balanced memoryLimit parameters with priority on
consumption instead of checkpoint/cleanup are helpful for a a better
regulation. Or something near it.
c.) Our results and an acceptable solution so far:
After a thorough investigation (w/o changing ActiveMQ source code) the result
is for now that we need to accept the limitations defined by the single
memoryLimit parameter used both for the #checkpoint/cleanup process and
browsing/consuming queues.
**1.) Memory**
There is not a problem, if we use a much higher memoryLimit (together
with a higher max-heap) to support both the message caching per
destination during the #checkpoint/cleanup workflow and our requirements to
browse/consume messages.
But more memory is not an option in our scenario, we need to deal with 1024m
max-heap and 500m memoryLimit.
Besides this, constantly setting higher memoryLimits just because of more
persistent queues containing hundreds/thousands of pending messages together
with certain offline/inactive consumer scenarios should be discussed in detail
(IMHO).
**2.) Persistent Adapters**
We ruled out persistent adapters as the cause of the problem, because the
behaviour doesn't change, if we switch different types of persistent stores
(KahaDB, LevelDB, JDBC-PostgreSQL).
During the debugging sessions with KahaDB we also see regular checkpoint
handling, the storage is managed as expected.
**3.) Destination Policy / Expiration Check**
Our problem completely disappears, if we disable caching and the expiration
check, which is the actual cause of the problem.
The corresponding properties are documented and there is a nice blog article
about Message Priorities with a description quite suitable for our scenario:
- http://activemq.apache.org/how-can-i-support-priority-queues.html
-
http://blog.christianposta.com/activemq/activemq-message-priorities-how-it-works/
We simply added useCache="false" and expireMessagesPeriod="0" to the
policyEntry:
{code:xml}
<destinationPolicy>
<policyMap>
<policyEntries>
<policyEntry queue=">" producerFlowControl="false" optimizedDispatch="true"
memoryLimit="128mb" timeBeforeDispatchStarts="1000"
useCache="false" expireMessagesPeriod="0">
<dispatchPolicy>
<strictOrderDispatchPolicy />
</dispatchPolicy>
<pendingQueuePolicy>
<storeCursor />
</pendingQueuePolicy>
</policyEntry>
</policyEntries>
</policyMap>
</destinationPolicy>
{code}
The consequences are clear, if we don't use in-mem caching anymore and never
check for message expiration.
For we neither use message expiration nor message priorities and the current
message dispatching is fast enough for us, this trade-off is acceptable
regarding given system limitations.
One should also think about well-defined prefetch limits for memory consumption
during specific workflows. Message sizes in our scenario can be 2 Bytes up to
approx. 100 KB, so more individual policyEntries and client consumer
configurations could be helpful to optimize system behaviour concerning
performance and memory usage (see
http://activemq.apache.org/per-destination-policies.html).
> No more browse/consume possible after #checkpoint run
> -----------------------------------------------------
>
> Key: AMQ-6115
> URL: https://issues.apache.org/jira/browse/AMQ-6115
> Project: ActiveMQ
> Issue Type: Bug
> Components: activemq-leveldb-store, Broker, KahaDB
> Affects Versions: 5.5.1, 5.11.2, 5.13.0
> Environment: OS=Linux,MacOS,Windows, Java=1.7,1.8, Xmx=1024m,
> SystemUsage Memory Limit 500 MB, Temp Limit 1 GB, Storage 80 GB
> Reporter: Klaus Pittig
> Attachments: Bildschirmfoto 2016-01-08 um 12.09.34.png,
> Bildschirmfoto 2016-01-08 um 13.29.08.png
>
>
> We are currently facing a problem when Using ActiveMQ with a large number of
> Persistence Queues (250) á 1000 persistent TextMessages á 10 KB.
> Our scenario requires these messages to remain in the storage over a long
> time (days), until they are consumed (large amounts of data are staged for
> distribution for many consumer, that could be offline for some days).
> This issue is independent of the JVM, OS and PersistentAdapter (KahaDB,
> LevelDB) with enough free space and memory.
> We tested this behaviour with ActiveMQ: 5.11.2, 5.13.0 and 5.5.1.
> After the Persistence Store is filled with these Messages (we use a simple
> unit test for production always the same message) and a broker restart, we
> can browse/consume some Queues _until_ the #checkpoint call after 30 seconds.
> This call causes the broker to use all available memory and never releases it
> for other tasks such as Queue browse/consume. Internally the MessageCursor
> seems to decide, that there is not enough memory and stops delivery of queue
> content to browsers/consumers.
> => Is there a way to avoid this behaviour of fix this?
> The expectation is, that we can consume/browse any queue under all
> circumstances.
> Besides the above mentioned settings we use the following settings for the
> broker (btw: changing the memoryLimit to a lower value like 1mb does not
> change the situation):
> {code:xml}
> <destinationPolicy>
> <policyMap>
> <policyEntries>
> <policyEntry queue=">" producerFlowControl="false"
> optimizedDispatch="true" memoryLimit="128mb">
> <dispatchPolicy>
> <strictOrderDispatchPolicy />
> </dispatchPolicy>
> <pendingQueuePolicy>
> <storeCursor/>
> </pendingQueuePolicy>
> </policyEntry>
> </policyEntries>
> </policyMap>
> </destinationPolicy>
> <systemUsage>
> <systemUsage sendFailIfNoSpace="true">
> <memoryUsage>
> <memoryUsage limit="500 mb"/>
> </memoryUsage>
> <storeUsage>
> <storeUsage limit="80000 mb"/>
> </storeUsage>
> <tempUsage>
> <tempUsage limit="1000 mb"/>
> </tempUsage>
> </systemUsage>
> </systemUsage>
> {code}
> If we set the *cursorMemoryHighWaterMark* in the destinationPolicy to a
> higher value like *150* or *600* depending on the difference between
> memoryUsage and the available heap space relieves the situation a bit for a
> workaround, but this is not really an option for production systems in my
> point of view.
> Attached some information from Oracle Mission Control and JProfiler showing
> those ActiveMQTextMessage instances that are never released from memory.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)