[
https://issues.apache.org/jira/browse/ARTEMIS-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736158#comment-16736158
]
ASF GitHub Bot commented on ARTEMIS-2216:
-----------------------------------------
Github user franz1981 commented on the issue:
https://github.com/apache/activemq-artemis/pull/2484
@michaelandrepearce I would like first to trigger a CI job of some kind,
maybe @clebertsuconic can help with his superbox (just this time) to get an
answer sooner?
Re the cache I was thinking already to send another PR, but I have verified
that is virtually impossible that's the reason of the consumer slow-down. These
are the numbers of a the bench comparing it with the original version:
```
Benchmark (size) (type) Mode Cnt Score
Error Units
CacheBench.getMessage1 32 chunked thrpt 10 150039261.251 ±
12504804.694 ops/s
CacheBench.getMessage1 32 linkedlist thrpt 10 31776755.611 ±
1405838.635 ops/s
CacheBench.getMessage1 1024 chunked thrpt 10 31433127.788 ±
3902498.303 ops/s
CacheBench.getMessage1 1024 linkedlist thrpt 10 2638404.341 ±
119171.758 ops/s
CacheBench.getMessage1 102400 chunked thrpt 10 344799.911 ±
27267.965 ops/s
CacheBench.getMessage1 102400 linkedlist thrpt 10 20020.925 ±
5392.418 ops/s
CacheBench.getMessage3 32 chunked thrpt 10 384605640.192 ±
35164543.632 ops/s
CacheBench.getMessage3 32 linkedlist thrpt 10 14124979.510 ±
2875341.272 ops/s
CacheBench.getMessage3 1024 chunked thrpt 10 90418506.375 ±
4593688.556 ops/s
CacheBench.getMessage3 1024 linkedlist thrpt 10 1562687.000 ±
91433.926 ops/s
CacheBench.getMessage3 102400 chunked thrpt 10 978575.016 ±
44800.161 ops/s
CacheBench.getMessage3 102400 linkedlist thrpt 10 21614.717 ±
5344.742 ops/s
```
Where `getMessage1` is `LivePageCacheImpl::getMessage` called @ random
positions by 1 thread and
`getMessage3` is `LivePageCacheImpl::getMessage` called @ random positions
by 3 threads.
`chunked` is the version and `linkedlist` the original version: the
difference is quite large and the new version just scale linearly...
> Use a specific executor for pageSyncTimer
> -----------------------------------------
>
> Key: ARTEMIS-2216
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2216
> Project: ActiveMQ Artemis
> Issue Type: Improvement
> Affects Versions: 2.6.3
> Reporter: Qihong Xu
> Priority: Major
> Attachments: contention_MASTER_global.svg, contention_PR_global.svg,
> contention_PR_single.svg
>
>
> Improving throughput on paging mode is one of our concerns since our cluster
> uses paging a lot.
> We found that pageSyncTimer in PagingStoreImpl shared the same executor with
> pageCursorProvider from thread pool. In heavy load scenario like hundreds of
> consumers receiving messages simultaneously, it became difficult for
> pageSyncTimer to get the executor due to race condition. Therefore page sync
> was delayed and producers suffered low throughput.
>
> To achieve higher performance we assign a specific executor to pageSyncTimer
> to avoid racing. And we run a small-scale test on a single modified broker.
>
> Broker: 4C/8G/500G SSD
> Producer: 200 threads, non-transactional send
> Consumer 200 threads, transactional receive
> Message text size: 100-200 bytes randomly
> AddressFullPolicy: PAGE
>
> Test result:
> | |Only Send TPS|Only Receive TPS|Send&Receive TPS|
> |Original ver|38k|33k|3k/30k|
> |Modified ver|38k|34k|30k/12.5k|
>
> The chart above shows that on modified broker send TPS improves from “poor”
> to “extremely fast”, while receive TPS drops from “extremely fast” to
> “not-bad” under heavy load. Considering consumer systems usually have a long
> processing chain after receiving messages, we don’t need too fast receive
> TPS. Instead, we want to guarantee send TPS to cope with traffic peak and
> lower producer’s delay time. Moreover, send and receive TPS in total raises
> from 33k to about 43k. From all above this trade-off seems beneficial and
> acceptable.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)