[jira] [Comment Edited] (ARTEMIS-2852) Huge performance decrease between versions 2.2.0 and 2.13.0

Kasper Kondzielski (Jira) Wed, 05 Aug 2020 03:03:48 -0700


    [ 
https://issues.apache.org/jira/browse/ARTEMIS-2852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17171394#comment-17171394
 ]


Kasper Kondzielski edited comment on ARTEMIS-2852 at 8/5/20, 10:02 AM:
-----------------------------------------------------------------------

I think that you got it right. We have 1 master and 2 slaves. 
 We wanted to achieve safe and persistent data replication. That's why we chose 
master-slave configuration, as it is the only one which guarantees replication. 
I know that the additional slave isn't used as only a single slave can be 
connected to a given master. I think that this is actually a leftover from a 
previous configurations and I just left is as it was. 

Maybe it would be easier to describe what we were trying to achieve based on 
some real example of another queue. Take a look at rabbitMq with their quorum 
queues for example. Given a cluster of 3 nodes each node participates equally 
to message processing and data replication. i.e. words data won't be lost even 
if any of them goes down.

Having said that I started to think that our test might be a little bit unfair, 
since we configured data replication (using master-slave approach) but we 
didn't take care of message redistribution.  Am I right, that a cluster of 3 
master nodes connected with each others and 3 slave nodes, each connected with 
a particular master node, would be a more appropriate solution?

Something like that:

!Selection_451.png!

Which also should solve the splitbrain problem.

Keep in mind that in our tests we are not scaling the cluster but rather amount 
of sender and receivers.

 


was (Author: kkondzielski):
I think that you got it right. We have 1 master and 2 slaves. 
We wanted to achieve safe and persistent data replication. That's why we chose 
master-slave configuration, as it is the only one which guarantees replication. 
I know that the additional slave isn't used as only a single slave can be 
connected to a given master. I think that this is actually a leftover from a 
previous configurations and I just left is as it was. 

Maybe it would be easier to describe what we were trying to achieve based on 
some real example of another queue. Take a look at rabbitMq with their quorum 
queues for example. Given a cluster of 3 nodes each node participates equally 
to message processing and data replication. i.e. words data won't be lost even 
if any of them goes down.

Having said that I started to think that our test might be a little bit unfair, 
since we configured data replication (using master-slave approach) but we 
didn't take care of message redistribution.  Am I right, that a cluster of 3 
master nodes connected with each others and 3 slave nodes, each connected with 
a particular master node, would be a more appropriate solution?

Something like that:

!Selection_451.png!

Keep in mind that in our tests we are not scaling the cluster but rather amount 
of sender and receivers.

 

> Huge performance decrease between versions 2.2.0 and 2.13.0
> -----------------------------------------------------------
>
>                 Key: ARTEMIS-2852
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-2852
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>            Reporter: Kasper Kondzielski
>            Priority: Major
>         Attachments: Selection_433.png, Selection_434.png, Selection_440.png, 
> Selection_441.png, Selection_451.png
>
>
> Hi,
> Recently, we started to prepare a new revision of our blog-post in which we 
> test various implementations of replicated queues. Previous version can be 
> found here:  [https://softwaremill.com/mqperf/]
> We updated artemis binary to 2.13.0, regenerated configuration file and 
> applied all the performance tricks you told us last time. In particular these 
> were:
>  * the {{Xmx}} java parameter bumped to {{16G (now bumped to 48G)}}
>  * in {{broker.xml}}, the {{global-max-size}} setting changed to {{8G (this 
> one we forgot to set, but we suspect that it is not the issue)}}
>  * {{journal-type}} set to {{MAPPED}}
>  * {{journal-datasync}}, {{journal-sync-non-transactional}} and 
> {{journal-sync-transactional}} all set to false
> Apart from that we changed machines' type we use to r5.2xlarge ( 8 cores, 64 
> GIB memory, Network bandwidth Up to 10 Gbps, Storage bandwidth Up to 4,750 
> Mbps) and we decided to always run twice as much receivers as senders.
> From our tests it looks like version 2.13.0 is not scaling as well, with the 
> increase of senders and receivers, as version 2.2.0 (previously tested). 
> Basically is not scaling at all as the throughput stays almost at the same 
> level, while previously it used to grow linearly.
> Here you can find our tests results for both versions: 
> [https://docs.google.com/spreadsheets/d/1kr9fzSNLD8bOhMkP7K_4axBQiKel1aJtpxsBCOy9ugU/edit?usp=sharing]
> We are aware that now there is a dedicated page in documentation about 
> performance tuning, but we are surprised that same settings as before 
> performs much worse.
> Maybe there is an obvious property which we overlooked which should be turned 
> on? 
> All changes between those versions together with the final configuration can 
> be found on this merged PR: 
> [https://github.com/softwaremill/mqperf/commit/6bfae489e11a250dc9e6ef59719782f839e8874a]
>  
> Charts showing machines' usage in attachments. Memory consumed by artemis 
> process didn't exceed ~ 16 GB. Bandwidht and cpu weren't also a bottlenecks. 
> p.s. I wanted to ask this question on mailing list/nabble forum first but it 
> seems that I don't have permissions to do so even though I registered & 
> subscribed. Is that intentional?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (ARTEMIS-2852) Huge performance decrease between versions 2.2.0 and 2.13.0

Reply via email to