[jira] [Commented] (LOG4J2-1430) Add optional support for Conversant DisruptorBlockingQueue in AsyncAppender

Anthony Maire (JIRA) Mon, 27 Jun 2016 05:38:20 -0700

    [ 
https://issues.apache.org/jira/browse/LOG4J2-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15350921#comment-15350921
 ]


Anthony Maire commented on LOG4J2-1430:
---------------------------------------

I made some additional tests on a dual xeon server (with 24 hyperthreads) to 
see how both queues behave under heavy contention (running lots of threads on a 
4-cores machine does not show contention for these kind of queue since the CAS 
operations will never be contended by a sleeping thread) :

- I first try to made some tests with the original ResponseTimeTest and 
AsyncAppender configuration files. It was pretty hard to find differences 
between both queue, because they both perform pretty well when the thoughput is 
way below the AsyncThread max thorughput. And the small difference between them 
are not significant because the amplitude is orders of magnitude below the GC 
pause caused by the appender itself
- I tried to run 10 threads at 100K msg/s while in the background another 
process was spinning on 18 threads to simulate contention by other processes 
(without tuning, another JVM will run GC on 18 threads on this machine). As 
expected, I saw that the test with Conversant Queue was performing a little 
worse than the JCTools queue, but the difference is not that impressive

I made additionnal tests on the queue by themselves, not used in an 
AsyncAppender (by writing custom RunXXX classes in the ResponseTimeTest 
package) :
- 20 producer threads and 1 consumer thread, all with a spinning wait policy, 
so I expect to see a 2100% CPU usage in top
- Pacer uses a Yield wait strategy (service time was worse with spinning since 
there were more threads than physical cores)
- 20 minutes test (with 1 minute of warmup before)

Since the AsyncThread was able to sustain a throughput of nearly 5 millions 
msg/s on this machine, I tried to run the consumer threads at 250 K ops/s each 
(so roughly 6 trillions operation for the whole test)
- JCTools queue was able to execute the test, with a *maximum* response time of 
11.8 ms and service time *99.99th percentile* is 2.8µs
- Conversant queue failed, only 2.82 trillions ops were made, *median* response 
time is 10.4ms and *median* service time is 3.1µs. Moreover CPU usage never 
reached 2100%, but was often between 1600% and 1800% because of the 
LockSupport.parkNanos() call in Condition.progressiveYield() in their offer 
method when producers are contending each other (this queue is not lock-free, 
although the contention impact is way lighter than in a lock-based 
implementation such as ABQ)

I made another test with 50K ops/s per threads (1M ops/s total, way below 
AsyncThread max throughput), both queues are able to execute the test, but 
Conversant queue service time is worse than the JCTools one (which has nearly 
the same service time in the two tests). Look at the vertical line in the 
attached screenshot for the Conversant curve (blue) : this 50µs gap match the 
minimal pause achievable with lockSupport.parkNanos() on this machine. It shows 
that it's the same contention issue between producer with prevented this queue 
to succeed in the 5M ops/s test.

=> JCtools queue provides more predictable latencies and a higher maximal 
throughput because of its lock-free nature

By the way, I wonder if it does really make sense to provide specific 
alternative queues support for the async appender, since this support 
implicitly mean "we tested this queue, it rocks hard". Being able to inject a 
custom factory is maybe enough. Currently the ABQ is probably enough for 99% of 
the users of asynchronous logging, and "performance hardcore users" should use 
AsyncLogger since it is garbage-free and more efficient (and using the original 
LMAX disruptor will offer some possibilities than will never be possible with a 
queue, such as LOG4J2-1424) ...

> Add optional support for Conversant DisruptorBlockingQueue in AsyncAppender
> ---------------------------------------------------------------------------
>
>                 Key: LOG4J2-1430
>                 URL: https://issues.apache.org/jira/browse/LOG4J2-1430
>             Project: Log4j 2
>          Issue Type: New Feature
>          Components: Appenders
>    Affects Versions: 2.6.1
>            Reporter: Matt Sicker
>            Assignee: Matt Sicker
>             Fix For: 2.7
>
>         Attachments: AsyncAppenderPerf01.txt, 
> jctools-vs-conversant-service-time.png, log4j2-1430-jctools-tmp-patch.txt
>
>
> [Conversant Disruptor|https://github.com/conversant/disruptor] works as an 
> implementation of BlockingQueue that is much faster than ArrayBlockingQueue. 
> I did some benchmarks earlier and found it to be a bit faster:
> h3. AsyncAppender/ArrayBlockingQueue
> {code}
> Benchmark                                                       Mode  Samples 
>        Score       Error  Units
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput10Params    thrpt       20 
>  1101267.173 ± 17583.204  ops/s
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput11Params    thrpt       20 
>  1128269.255 ± 12188.910  ops/s
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput1Param      thrpt       20 
>  1525470.805 ± 56515.933  ops/s
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput2Params     thrpt       20 
>  1789434.196 ± 42733.475  ops/s
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput3Params     thrpt       20 
>  1803276.278 ± 34938.176  ops/s
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput4Params     thrpt       20 
>  1468550.776 ± 26402.286  ops/s
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput5Params     thrpt       20 
>  1322304.349 ± 22417.997  ops/s
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput6Params     thrpt       20 
>  1179756.489 ± 16502.276  ops/s
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput7Params     thrpt       20 
>  1324660.677 ± 18893.944  ops/s
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput8Params     thrpt       20 
>  1309365.962 ± 19602.489  ops/s
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput9Params     thrpt       20 
>  1422144.180 ± 20815.042  ops/s
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughputSimple      thrpt       20 
>  1247862.372 ± 18300.764  ops/s
> {code}
> h3. AsyncAppender/DisruptorBlockingQueue
> {code}
> Benchmark                                                       Mode  Samples 
>        Score        Error  Units
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput10Params    thrpt       20 
>  3704735.586 ±  59766.253  ops/s
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput11Params    thrpt       20 
>  3622175.410 ±  31975.353  ops/s
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput1Param      thrpt       20 
>  6862480.428 ± 121473.276  ops/s
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput2Params     thrpt       20 
>  6193288.988 ±  93545.144  ops/s
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput3Params     thrpt       20 
>  5715621.712 ± 131878.581  ops/s
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput4Params     thrpt       20 
>  5745187.005 ± 213854.016  ops/s
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput5Params     thrpt       20 
>  5307137.396 ±  88135.709  ops/s
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput6Params     thrpt       20 
>  4953015.419 ±  72100.403  ops/s
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput7Params     thrpt       20 
>  4833836.418 ±  52919.314  ops/s
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput8Params     thrpt       20 
>  4353791.507 ±  79047.812  ops/s
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput9Params     thrpt       20 
>  4136761.624 ±  67804.253  ops/s
> o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughputSimple      thrpt       20 
>  6719456.722 ± 187433.301  ops/s
> {code}
> h3. AsyncLogger
> {code}
> Benchmark                                                Mode  Samples        
>  Score        Error  Units
> o.a.l.l.p.j.AsyncLoggersBenchmark.throughput10Params    thrpt       20   
> 5075883.371 ± 180465.316  ops/s
> o.a.l.l.p.j.AsyncLoggersBenchmark.throughput11Params    thrpt       20   
> 4867362.030 ± 193909.465  ops/s
> o.a.l.l.p.j.AsyncLoggersBenchmark.throughput1Param      thrpt       20  
> 10294733.024 ± 226536.965  ops/s
> o.a.l.l.p.j.AsyncLoggersBenchmark.throughput2Params     thrpt       20   
> 9021650.667 ± 351102.255  ops/s
> o.a.l.l.p.j.AsyncLoggersBenchmark.throughput3Params     thrpt       20   
> 8079337.905 ± 115824.975  ops/s
> o.a.l.l.p.j.AsyncLoggersBenchmark.throughput4Params     thrpt       20   
> 7347356.788 ±  66598.738  ops/s
> o.a.l.l.p.j.AsyncLoggersBenchmark.throughput5Params     thrpt       20   
> 6930636.174 ± 150072.908  ops/s
> o.a.l.l.p.j.AsyncLoggersBenchmark.throughput6Params     thrpt       20   
> 6309567.300 ± 293709.787  ops/s
> o.a.l.l.p.j.AsyncLoggersBenchmark.throughput7Params     thrpt       20   
> 6051997.196 ± 268405.087  ops/s
> o.a.l.l.p.j.AsyncLoggersBenchmark.throughput8Params     thrpt       20   
> 5273376.623 ±  99168.461  ops/s
> o.a.l.l.p.j.AsyncLoggersBenchmark.throughput9Params     thrpt       20   
> 5091137.594 ± 150617.444  ops/s
> o.a.l.l.p.j.AsyncLoggersBenchmark.throughputSimple      thrpt       20  
> 11136623.731 ± 400350.272  ops/s
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LOG4J2-1430) Add optional support for Conversant DisruptorBlockingQueue in AsyncAppender

Reply via email to