ALAN DIEGO DANTAS PROTASIO created AMQ-7028:
-----------------------------------------------

             Summary: Poor performance when concurrentStoreAndDispatchQueues + 
slow FS + Slow Consumers
                 Key: AMQ-7028
                 URL: https://issues.apache.org/jira/browse/AMQ-7028
             Project: ActiveMQ
          Issue Type: Improvement
          Components: KahaDB
    Affects Versions: 5.15.4
            Reporter: ALAN DIEGO DANTAS PROTASIO


Using high latency FS (as NFS) to store kahadb files and setting 
concurrentStoreAndDispatchQueues=true may cause poor performance for slow 
consumer. This happens because using this option makes activemq write the 
produced messages one by one to the underlying file system (this is implemented 
by using a SingleThread ExecutorService).

Lets say that for each write to the FS takes 10ms and the queue has slow 
consumers. In this case, does not matter the number of concurrent messages the 
producers try to send to the queue, the maximum performance we can achieve is 
100 TPS. Tuning this flag off, we can see a really better performance for 
sending messages in parallel as those messages can be batched to the FS in a 
single write (the performance increases with the number of concurrent messages 
being sent in parallel).

Looking at Activemq code we found that there is an flag used on levelDb to 
detect if the queue has fast or slow consumers, and decide if it will use 
concurrentStoreAndDispach or not.

https://issues.apache.org/jira/browse/AMQ-3750

but this flag is not used on the KahaDb implementation.

We made a code change to receive the flag in the KahaDbStore and use it to 
decide if the message will be stored async or not.

We think that there is no reason to try to "StoreAndDispatch" if the 
destination has slow consumers. This only brings overhead and in case of high 
latency FS, really poor performance when the queue has slow consumer.

For fast consumers, this change will have no effect giving the better of the 2 
options.

Some Results:

Original Version:

Fast Consumers:

Producer
 mean rate = 8248.50 calls/second
 min = 0.42 milliseconds
 max = 756.61 milliseconds
 mean = 11.30 milliseconds
 stddev = 44.05 milliseconds
 median = 6.02 milliseconds
 75% <= 9.79 milliseconds
 95% <= 18.15 milliseconds
 98% <= 27.71 milliseconds
 99% <= 123.51 milliseconds
 99.9% <= 756.61 milliseconds

Slow consumers:

Producer
 mean rate = 84.29 calls/second
 min = 86.27 milliseconds
 max = 1467.53 milliseconds
 mean = 1082.55 milliseconds
 stddev = 154.04 milliseconds
 median = 1075.94 milliseconds
 75% <= 1169.10 milliseconds
 95% <= 1308.90 milliseconds
 98% <= 1350.85 milliseconds
 99% <= 1363.61 milliseconds
 99.9% <= 1466.67 milliseconds

Patched Version:

Fast Consumers:

Producer
 count = 890783
 mean rate = 8099.33 calls/second
 min = 0.47 milliseconds
 max = 2259.10 milliseconds
 mean = 13.90 milliseconds
 stddev = 84.84 milliseconds
 median = 5.00 milliseconds
 75% <= 9.08 milliseconds
 95% <= 15.66 milliseconds
 98% <= 32.94 milliseconds
 99% <= 355.52 milliseconds
 99.9% <= 731.69 milliseconds

Slow consumers:

Producer
 mean rate = 1732.25 calls/second
 1-minute rate = 1811.80 calls/second
 min = 17.52 milliseconds
 max = 1249.54 milliseconds
 mean = 50.95 milliseconds
 stddev = 130.68 milliseconds
 median = 28.73 milliseconds
 75% <= 32.51 milliseconds
 95% <= 57.04 milliseconds
 98% <= 461.46 milliseconds
 99% <= 937.87 milliseconds
 99.9% <= 1249.48 milliseconds



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to