[jira] [Commented] (AMQ-5077) Improve performance of ConcurrentStoreAndDispatch

Richard Wagg (JIRA) Thu, 27 Feb 2014 09:28:53 -0800

    [ 
https://issues.apache.org/jira/browse/AMQ-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13914756#comment-13914756
 ]


Richard Wagg commented on AMQ-5077:
-----------------------------------

Hi,
Attached is some more different test configurations i've tried and results. 
Nothing seems to massively affect the throughput for better or worse. 
In none of the tests have the consumers had any message backlog. 

To try and sum up the problem: 
- Under normal operation we want our producers to remain unblocked for as long 
as possible (flow control is fine, but that to me means that the producer works 
uninterrupted up until a memory/disk limit is reached, then PFC kicks in, 
rather than a gradual degradation).
- Clients all run in CLIENT_ACK mode - messages are ACKed one by one. 
- Both the diskstore and the network are relatively quick - tests running 
against topics show a roundtrip time of ~ 0.740ms (producer -> broker -> 
consumer -> broker -> producer reply). 
- The ability of the producers to send messages is currently limited by some 
TCP level limitation, due to the amount of work the broker is doing on it's 
receive threads. 
- The observed behaviour in producer code is that whether sending sync or 
async, the call to producer.send() just blocks - so even async sends are 
affected by JMS throughput, in a manner which isn't flow control. 
- In none of our tests have the consumers ever had a large pending message 
count - the blocking factor is not consumer speed or memory/queue limits. 
- Taking thread dumps throughout these tests, we can see contention around the 
synchronised access points in MessageDatabase - the process() method taking 
arguments  KahaAddMessageCommand and KahaRemoveMessageCommand both lock around 
the page index. 
- Any sort of option to batch consumer acks might remove the number of single 
messages removed, but would also delay the ACKs to the point where more is 
written to the store. 
- Some options in the JMS (ConcurrentStoreAndDispatch) are supposed to allow 
optimisations in this scenario, but appear to have limited or no effect. 

I think there are 2 options for the root problem: 
1. Disk writes are too quick (or too many disk writes are allowed to be in 
progress) - by the time the roundtrip from broker -> consumer and back 
completes, the kahaDb write is already done or in progress.
2. Thread contention stops the ACKs arriving in time to prevent the diskstore 
writes from happening, negating the benefit of allowing 
concurrentStoreAndDispatch from a performance point of view (clients might 
receive messages quicker, but broker still has to add and remove each message 
serially from the diskstore). 

I think the second option is most likely - we're effectively doing a lot of 
disk based work that we don't need to do, just because the consumer ACKs aren't 
coming back quick enough or aren't able to be received before the disk write is 
in progress, causing a double hit on both thread synchronisation, and then 
again at the disk level adding and removing the same message inside a small 
timeframe. 

Ideas welcome for configurations to try or areas to look at. Stuff i'm going to 
try:
- Add some debug logging to see the queue length of the asyncQueueJobQueue in 
KahaDBStore
- Changing client ack mode - optimizeAcknowledge, DUPS_OK_ACK etc. Don't think 
it'll have much effect but it's something else to rule out. 

I would be interested in trying any code which would allow the disk writes to 
go on a delay queue - something along the lines of dispatch straight away, only 
write to kahaDB if the ACK hasn't arrived after a configurable interval. 
I'm still not sure that a lot of this work should be done on the NIO send 
threads - even if the contention on the kahaDb store needs to be allowed to 
happen, i would expect requests to be allowed to queue up in memory, until a 
PFC limit kicks in. Until that point i wouldn't expect producer send 
performance to be affected. 

Diskstore performance for comparison:
[activemq@londngnfrjms01 lib]$ /opt/java/x64/jdk1.7.0_51/bin/java -cp 
activemq-kahadb-store-5.9.0.redhat-610350.jar 
org.apache.activemq.store.kahadb.disk.util.DiskBenchmark 
/jms_vsp/activemq-rh590/data/test.dat
Benchmarking: /jms_vsp/activemq-rh590/data/test.dat
Writes:
  1023993 writes of size 4096 written in 10.69 seconds.
  95789.805 writes/second.
  374.17892 megs/second.

Sync Writes:
  49746 writes of size 4096 written in 10.001 seconds.
  4974.1025 writes/second.
  19.430088 megs/second.

Reads:
  5468429 reads of size 4096 read in 10.001 seconds.
  546788.25 writes/second.
  2135.8916 megs/second."

> Improve performance of ConcurrentStoreAndDispatch
> -------------------------------------------------
>
>                 Key: AMQ-5077
>                 URL: https://issues.apache.org/jira/browse/AMQ-5077
>             Project: ActiveMQ
>          Issue Type: Wish
>          Components: Message Store
>    Affects Versions: 5.9.0
>         Environment: 5.9.0.redhat-610343
>            Reporter: Jason Shepherd
>            Priority: Minor
>         Attachments: compDesPerf.tar.gz
>
>
> We have publishers publishing to a topic which has 5 topic -> queue routings, 
> and gets a max message rate attainable of ~833 messages/sec, with each 
> message around 5k in size.
> To test this i set up a JMS config with topic queues:
> Topic
> TopicRouted.1
> ...
> TopicRouted.11
> Each topic has an increasing number of routings to queues, and a client is 
> set up to subscribe to all the queues.
> Rough message rates:
> routings messages/sec
> 0 2500
> 1 1428
> 2 2000
> 3 1428
> 4 1111
> 5 833
> This occurs whether the broker config has producerFlowControl="false" set to 
> true or false , and KahaDB disk synching is turned off. We also tried 
> experimenting with concurrentStoreAndDispatch, but that didn't seem to help. 
> LevelDB didn't give any notable performance improvement either.
> We also have asyncSend enabled on the producer, and have a requirement to use 
> persistent messages. We have also experimented with sending messages in a 
> transaction, but that hasn't really helped.
> It seems like producer throughput rate across all queue destinations, all 
> connections and all publisher machines is limited by something on the broker, 
> through a mechanism which is not producer flow control. I think the prime 
> suspect is still contention on the index.
> We did some test with Yourkit profiler.
> Profiler was attached to broker at startup, allowed to run and then a topic 
> publisher was started, routing to 5 queues. 
> Profiler statistics were reset, the publisher allowed to run for 60 seconds, 
> and then profiling snapshot was taken. During that time, ~9600 messages were 
> logged as being sent for a rate of ~160/sec.
> This ties in roughly with the invocation counts recorded in the snapshot (i 
> think) - ~43k calls. 
> From what i can work out, in the snapshot (filtering everything but 
> org.apache.activemq.store.kahadb), 
> For the 60 second sample period, 
> 24.8 seconds elapsed in 
> org.apache.activemq.store.kahadb.KahaDbTransactionStore$1.removeAsyncMessage(ConnectionContext,
>  MessageAck).
> 18.3 seconds elapsed in 
> org.apache.activemq.store.kahadb.KahaDbTransactionStore$1.asyncAddQueueMessage(ConnectionContext,
>  Message, boolean).
> From these, a further large portion of the time is spent inside 
> MessageDatabase:
> org.apache.activemq.store.kahadb.MessageDatabase.process(KahaRemoveMessageCommand,
>  Location) - 10 secs elapsed
> org.apache.activemq.store.kahadb.MessageDatabase.process(KahaAddMessageCommand,
>  Location) - 8.5 secs elapsed.
> As both of these lock on indexLock.writeLock(), and both take place on the 
> NIO transport threads, i think this accounts for at least some of the message 
> throughput limits. As messages are added and removed from the index one by 
> one, regardless of sync type settings, this adds a fair amount of overhead. 
> While we're not synchronising on writes to disk, we are performing work on 
> the NIO worker thread which can block on locks, and could account for the 
> behaviour we've seen client side. 
> To Reproduce:
> 1. Install a broker and use the attached configuration.
> 2. Use the 5.8.0 example ant script to consume from the queues, 
> TopicQueueRouted.1 - 5. eg:
>    ant consumer -Durl=tcp://localhost:61616 -Dsubject=TopicQueueRouted.1 
> -Duser=admin -Dpassword=admin -Dmax=-1
> 3. Use the modified version of 5.8.0 example ant script (attached) to send 
> messages to topics, TopicRouted.1 - 5, eg:
>    ant producer 
> -Durl='tcp://localhost:61616?jms.useAsyncSend=true&wireFormat.tightEncodingEnabled=false&keepAlive=true&wireFormat.maxInactivityDuration=60000&socketBufferSize=32768'
>  -Dsubject=TopicRouted.1 -Duser=admin -Dpassword=admin -Dmax=1 -Dtopic=true 
> -DsleepTime=0 -Dmax=10000 -DmessageSize=5000
> This modified version of the script prints the number of messages per second 
> and prints it to the console.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (AMQ-5077) Improve performance of ConcurrentStoreAndDispatch

Reply via email to