[
https://issues.apache.org/jira/browse/LOG4J2-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15350921#comment-15350921
]
Anthony Maire commented on LOG4J2-1430:
---------------------------------------
I made some additional tests on a dual xeon server (with 24 hyperthreads) to
see how both queues behave under heavy contention (running lots of threads on a
4-cores machine does not show contention for these kind of queue since the CAS
operations will never be contended by a sleeping thread) :
- I first try to made some tests with the original ResponseTimeTest and
AsyncAppender configuration files. It was pretty hard to find differences
between both queue, because they both perform pretty well when the thoughput is
way below the AsyncThread max thorughput. And the small difference between them
are not significant because the amplitude is orders of magnitude below the GC
pause caused by the appender itself
- I tried to run 10 threads at 100K msg/s while in the background another
process was spinning on 18 threads to simulate contention by other processes
(without tuning, another JVM will run GC on 18 threads on this machine). As
expected, I saw that the test with Conversant Queue was performing a little
worse than the JCTools queue, but the difference is not that impressive
I made additionnal tests on the queue by themselves, not used in an
AsyncAppender (by writing custom RunXXX classes in the ResponseTimeTest
package) :
- 20 producer threads and 1 consumer thread, all with a spinning wait policy,
so I expect to see a 2100% CPU usage in top
- Pacer uses a Yield wait strategy (service time was worse with spinning since
there were more threads than physical cores)
- 20 minutes test (with 1 minute of warmup before)
Since the AsyncThread was able to sustain a throughput of nearly 5 millions
msg/s on this machine, I tried to run the consumer threads at 250 K ops/s each
(so roughly 6 trillions operation for the whole test)
- JCTools queue was able to execute the test, with a *maximum* response time of
11.8 ms and service time *99.99th percentile* is 2.8µs
- Conversant queue failed, only 2.82 trillions ops were made, *median* response
time is 10.4ms and *median* service time is 3.1µs. Moreover CPU usage never
reached 2100%, but was often between 1600% and 1800% because of the
LockSupport.parkNanos() call in Condition.progressiveYield() in their offer
method when producers are contending each other (this queue is not lock-free,
although the contention impact is way lighter than in a lock-based
implementation such as ABQ)
I made another test with 50K ops/s per threads (1M ops/s total, way below
AsyncThread max throughput), both queues are able to execute the test, but
Conversant queue service time is worse than the JCTools one (which has nearly
the same service time in the two tests). Look at the vertical line in the
attached screenshot for the Conversant curve (blue) : this 50µs gap match the
minimal pause achievable with lockSupport.parkNanos() on this machine. It shows
that it's the same contention issue between producer with prevented this queue
to succeed in the 5M ops/s test.
=> JCtools queue provides more predictable latencies and a higher maximal
throughput because of its lock-free nature
By the way, I wonder if it does really make sense to provide specific
alternative queues support for the async appender, since this support
implicitly mean "we tested this queue, it rocks hard". Being able to inject a
custom factory is maybe enough. Currently the ABQ is probably enough for 99% of
the users of asynchronous logging, and "performance hardcore users" should use
AsyncLogger since it is garbage-free and more efficient (and using the original
LMAX disruptor will offer some possibilities than will never be possible with a
queue, such as LOG4J2-1424) ...
> Add optional support for Conversant DisruptorBlockingQueue in AsyncAppender
> ---------------------------------------------------------------------------
>
> Key: LOG4J2-1430
> URL: https://issues.apache.org/jira/browse/LOG4J2-1430
> Project: Log4j 2
> Issue Type: New Feature
> Components: Appenders
> Affects Versions: 2.6.1
> Reporter: Matt Sicker
> Assignee: Matt Sicker
> Fix For: 2.7
>
> Attachments: AsyncAppenderPerf01.txt,
> jctools-vs-conversant-service-time.png, log4j2-1430-jctools-tmp-patch.txt
>
>
> [Conversant Disruptor|https://github.com/conversant/disruptor] works as an
> implementation of BlockingQueue that is much faster than ArrayBlockingQueue.
> I did some benchmarks earlier and found it to be a bit faster:
> h3. AsyncAppender/ArrayBlockingQueue
> {code}
> Benchmark Mode Samples
> Score Error Units
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput10Params thrpt 20
> 1101267.173 ± 17583.204 ops/s
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput11Params thrpt 20
> 1128269.255 ± 12188.910 ops/s
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput1Param thrpt 20
> 1525470.805 ± 56515.933 ops/s
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput2Params thrpt 20
> 1789434.196 ± 42733.475 ops/s
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput3Params thrpt 20
> 1803276.278 ± 34938.176 ops/s
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput4Params thrpt 20
> 1468550.776 ± 26402.286 ops/s
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput5Params thrpt 20
> 1322304.349 ± 22417.997 ops/s
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput6Params thrpt 20
> 1179756.489 ± 16502.276 ops/s
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput7Params thrpt 20
> 1324660.677 ± 18893.944 ops/s
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput8Params thrpt 20
> 1309365.962 ± 19602.489 ops/s
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput9Params thrpt 20
> 1422144.180 ± 20815.042 ops/s
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughputSimple thrpt 20
> 1247862.372 ± 18300.764 ops/s
> {code}
> h3. AsyncAppender/DisruptorBlockingQueue
> {code}
> Benchmark Mode Samples
> Score Error Units
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput10Params thrpt 20
> 3704735.586 ± 59766.253 ops/s
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput11Params thrpt 20
> 3622175.410 ± 31975.353 ops/s
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput1Param thrpt 20
> 6862480.428 ± 121473.276 ops/s
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput2Params thrpt 20
> 6193288.988 ± 93545.144 ops/s
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput3Params thrpt 20
> 5715621.712 ± 131878.581 ops/s
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput4Params thrpt 20
> 5745187.005 ± 213854.016 ops/s
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput5Params thrpt 20
> 5307137.396 ± 88135.709 ops/s
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput6Params thrpt 20
> 4953015.419 ± 72100.403 ops/s
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput7Params thrpt 20
> 4833836.418 ± 52919.314 ops/s
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput8Params thrpt 20
> 4353791.507 ± 79047.812 ops/s
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput9Params thrpt 20
> 4136761.624 ± 67804.253 ops/s
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughputSimple thrpt 20
> 6719456.722 ± 187433.301 ops/s
> {code}
> h3. AsyncLogger
> {code}
> Benchmark Mode Samples
> Score Error Units
> o.a.l.l.p.j.AsyncLoggersBenchmark.throughput10Params thrpt 20
> 5075883.371 ± 180465.316 ops/s
> o.a.l.l.p.j.AsyncLoggersBenchmark.throughput11Params thrpt 20
> 4867362.030 ± 193909.465 ops/s
> o.a.l.l.p.j.AsyncLoggersBenchmark.throughput1Param thrpt 20
> 10294733.024 ± 226536.965 ops/s
> o.a.l.l.p.j.AsyncLoggersBenchmark.throughput2Params thrpt 20
> 9021650.667 ± 351102.255 ops/s
> o.a.l.l.p.j.AsyncLoggersBenchmark.throughput3Params thrpt 20
> 8079337.905 ± 115824.975 ops/s
> o.a.l.l.p.j.AsyncLoggersBenchmark.throughput4Params thrpt 20
> 7347356.788 ± 66598.738 ops/s
> o.a.l.l.p.j.AsyncLoggersBenchmark.throughput5Params thrpt 20
> 6930636.174 ± 150072.908 ops/s
> o.a.l.l.p.j.AsyncLoggersBenchmark.throughput6Params thrpt 20
> 6309567.300 ± 293709.787 ops/s
> o.a.l.l.p.j.AsyncLoggersBenchmark.throughput7Params thrpt 20
> 6051997.196 ± 268405.087 ops/s
> o.a.l.l.p.j.AsyncLoggersBenchmark.throughput8Params thrpt 20
> 5273376.623 ± 99168.461 ops/s
> o.a.l.l.p.j.AsyncLoggersBenchmark.throughput9Params thrpt 20
> 5091137.594 ± 150617.444 ops/s
> o.a.l.l.p.j.AsyncLoggersBenchmark.throughputSimple thrpt 20
> 11136623.731 ± 400350.272 ops/s
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]