[ 
https://issues.apache.org/jira/browse/AMQ-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16608875#comment-16608875
 ] 

Gary Tully commented on AMQ-7028:
---------------------------------

The first thing to figure is if the tests work ok on their own. Some some tests 
fail when run in a bunch for unknown reasons. it can be tricky to get to the 
bottom of those failures. This is a problem, but it will take some work to make 
them reliable.

If a test fails reliably when run in isolation, this is a problem b/c that test 
should have worked at some stage in the past. There is some issue with the test 
or some regression. These failures need to be treated as bugs till they get 
resolved by figuring out the root cause for the failure; issue with the test 
assertions/environment or with some code change. These need jiras to track.

if the test works when run in isolation, then doing a manual run will suffice, 
this is not ideal, but again, getting to the bottom of such a failure may be 
tricky and will take some time. 

In addition, the test may be ok on master, pointing to a problem on the branch.

The general rule is that prior to a release, all the tests should run on the 
mainline, ie: a full test cycle to completion. That *should* include the 
activemq.all profile.

 

To run in isolation:

{{> mvn clean install -Dtest=JdbcXARecoveryBrokerTest}}

 

 

> Poor performance when concurrentStoreAndDispatchQueues + slow FS + Slow 
> Consumers
> ---------------------------------------------------------------------------------
>
>                 Key: AMQ-7028
>                 URL: https://issues.apache.org/jira/browse/AMQ-7028
>             Project: ActiveMQ
>          Issue Type: Improvement
>          Components: KahaDB
>    Affects Versions: 5.15.4
>            Reporter: Alan Protasio
>            Priority: Major
>
> Using high latency FS (as NFS) to store kahadb files and setting 
> concurrentStoreAndDispatchQueues=true may cause poor performance for slow 
> consumer. This happens because using this option makes activemq write the 
> produced messages one by one to the underlying file system (this is 
> implemented by using a SingleThread ExecutorService).
> Lets say that for each write to the FS takes 10ms and the queue has slow 
> consumers. In this case, does not matter the number of concurrent messages 
> the producers try to send to the queue, the maximum performance we can 
> achieve is 100 TPS. Tuning this flag off, we can see a really better 
> performance for sending messages in parallel as those messages can be batched 
> to the FS in a single write (the performance increases with the number of 
> concurrent messages being sent in parallel).
> Looking at Activemq code we found that there is an flag used on levelDb to 
> detect if the queue has fast or slow consumers, and decide if it will use 
> concurrentStoreAndDispach or not.
> https://issues.apache.org/jira/browse/AMQ-3750
> but this flag is not used on the KahaDb implementation.
> We made a code change to receive the flag in the KahaDbStore and use it to 
> decide if the message will be stored async or not.
> We think that there is no reason to try to "StoreAndDispatch" if the 
> destination has slow consumers. This only brings overhead and in case of high 
> latency FS, really poor performance when the queue has slow consumer.
> For fast consumers, this change will have no effect giving the better of the 
> 2 options.
> Some Results:
> Original Version:
> Fast Consumers:
> Producer
>  mean rate = 8248.50 calls/second
>  min = 0.42 milliseconds
>  max = 756.61 milliseconds
>  mean = 11.30 milliseconds
>  stddev = 44.05 milliseconds
>  median = 6.02 milliseconds
>  75% <= 9.79 milliseconds
>  95% <= 18.15 milliseconds
>  98% <= 27.71 milliseconds
>  99% <= 123.51 milliseconds
>  99.9% <= 756.61 milliseconds
> Slow consumers:
> Producer
>  mean rate = 84.29 calls/second
>  min = 86.27 milliseconds
>  max = 1467.53 milliseconds
>  mean = 1082.55 milliseconds
>  stddev = 154.04 milliseconds
>  median = 1075.94 milliseconds
>  75% <= 1169.10 milliseconds
>  95% <= 1308.90 milliseconds
>  98% <= 1350.85 milliseconds
>  99% <= 1363.61 milliseconds
>  99.9% <= 1466.67 milliseconds
> Patched Version:
> Fast Consumers:
> Producer
>  count = 890783
>  mean rate = 8099.33 calls/second
>  min = 0.47 milliseconds
>  max = 2259.10 milliseconds
>  mean = 13.90 milliseconds
>  stddev = 84.84 milliseconds
>  median = 5.00 milliseconds
>  75% <= 9.08 milliseconds
>  95% <= 15.66 milliseconds
>  98% <= 32.94 milliseconds
>  99% <= 355.52 milliseconds
>  99.9% <= 731.69 milliseconds
> Slow consumers:
> Producer
>  mean rate = 1732.25 calls/second
>  1-minute rate = 1811.80 calls/second
>  min = 17.52 milliseconds
>  max = 1249.54 milliseconds
>  mean = 50.95 milliseconds
>  stddev = 130.68 milliseconds
>  median = 28.73 milliseconds
>  75% <= 32.51 milliseconds
>  95% <= 57.04 milliseconds
>  98% <= 461.46 milliseconds
>  99% <= 937.87 milliseconds
>  99.9% <= 1249.48 milliseconds



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to