[
https://issues.apache.org/jira/browse/ARTEMIS-3163?focusedWorklogId=562000&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-562000
]
ASF GitHub Bot logged work on ARTEMIS-3163:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 07/Mar/21 17:04
Start Date: 07/Mar/21 17:04
Worklog Time Spent: 10m
Work Description: franz1981 edited a comment on pull request #3479:
URL: https://github.com/apache/activemq-artemis/pull/3479#issuecomment-792311156
These are my results by using a single single-threaded acceptor for both
clients and replication (on the live broker) to fairly compare epoll vs
io_uring under load.
The test is similar to the one on
https://issues.apache.org/jira/browse/ARTEMIS-2852 with 32 JMS core clients,
100 persistent bytes messages and IO_URING transport has been used *only* on
the live server, leaving the rest as it is ie backup + clients
NOTE: These are just preliminary results, so I won't share HW configuration
or anything to make this reproducible, but it should give the magnitude of
improvement offered by io_uring.
`master`:
```
**************
EndToEnd Throughput: 22582 ops/sec
**************
EndToEnd SERVICE-TIME Latencies distribution in MICROSECONDS
mean 1410.83
min 333.82
50.00% 1368.06
90.00% 1679.36
99.00% 2293.76
99.90% 3489.79
99.99% 13107.20
max 16187.39
count 320000
```
`this pr`:
```
**************
EndToEnd Throughput: 30540 ops/sec
**************
EndToEnd SERVICE-TIME Latencies distribution in MICROSECONDS
mean 1052.52
min 329.73
50.00% 1007.62
90.00% 1286.14
99.00% 1736.70
99.90% 4653.06
99.99% 13893.63
max 16711.68
count 320000
```
The profile data collected with
https://github.com/jvm-profiling-tools/async-profiler/ are attached on
https://issues.apache.org/jira/browse/ARTEMIS-3163
But the important bits are:
- Replication event loop thread: 935 (epoll) vs 775 (io_uring) samples ->
~94% cpu usage vs 78% cpu usage
- SYSCALLs samples:
`epoll`: ~61% samples

`io_uring`: ~31% samples

The io_uring version is far more efficient while using resources then epoll
despite our replication process already try to batch writes as much as possible
to amortize syscall cost: would be interesting to compare epoll with some
OpenOnLoad kernel bypass driver vs io_uring :P
*IMPORTANT*:
Why I've chosen to use a single thread for everything?
Don't be tempted to use the default configuration, because it uses 3 *
available cores for the replication/client acceptors: the io_uring version is
that much efficient then epoll then the Netty event loops tends to go idle most
of the time and need to be awaken, causing application threads to always pay
the cost to wakeup event loop threads...this can make the io_uring version to
look worse then epoll, while is right the opposite(!!)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 562000)
Time Spent: 1h 10m (was: 1h)
> Experimental support for Netty IO_URING incubator
> -------------------------------------------------
>
> Key: ARTEMIS-3163
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3163
> Project: ActiveMQ Artemis
> Issue Type: New Feature
> Reporter: Francesco Nigro
> Assignee: Francesco Nigro
> Priority: Major
> Attachments: flamegraphs.zip
>
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> Netty provides incubator support ie not for production usage yet for IO_URING
> (see https://github.com/netty/netty-incubator-transport-io_uring).
> It would be nice for Artemis to support it and allow devs/users to start
> playing with it.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)