[ https://issues.apache.org/jira/browse/LOG4J2-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345211#comment-15345211 ]
Anthony Maire commented on LOG4J2-1430: --------------------------------------- The approch of using a NoOp appender for testing AsyncAppender isn't a bad idea at all. The "issue" is that it's easy to do a wrong interpretation of the result. The issue is that we naturally expect the queue to be the bottleneck (and in fact it is the bottleneck in a lots of ABQ tests), but with those high-performance queue, the single-threaded code between queue.take() and appender.append() is the bottleneck. With 8 producer threads, I had the consumer thread 100% busy but producers were not. So we are measuring something that doesn't reflect the real world (since in real application we don't expect the queue to be always full), and this benchmark is not good to show which queue implementation is the best. I'm probably biaised because I'm working for a low-latency company, but the best metrics to evaluate the "performance" of the AsyncAppender are latency, and latency is really damn hard to measure (cf this excellent talk from Gil Tene on this subject: https://www.infoq.com/presentations/latency-response-time ). For such cases, I usually mesure latency percentiles at a load that is below maximal throughput (since all systems suck from a latency point of view when running near their saturation point) by recording all individual results in a HdrHistogram. I don't think that is something that can be done with JMH, I usually write custom load generator to do this A major caveat (among lots of others) with measuring latency is that the production environment may be very different from the idealized environment you have when doing benchmarks. Especially, the system might be overloaded (i.e there are more active threads in the whole machine, including kernel threads, than number of available core) and/or virtualized. In this situation the kind of concurrency (wait-free, lock-free ...) will matter a lot since threads can be suspended for several milliseconds Showing this effect in a benchmark is hard, because it occurs at a very low frequency with "production-like" load and there are other causes of low frequency outliers. It may require lots of analysis. The usual way to show that an algorithm as poorer concurrency properties is to increase the throughput and/or the number of threads of the load generator until some candidates behave poorly: lock-free algorithm will scale better than blocking algorithms. But with current AsyncAppender code, doing this kind of test is not possible, because the async thread isn't performant enough. To compare queues, the best thing is to compare them without being integrated into AsyncAppender. Otherwise, we need to optimise the AsyncAppender itself a lot to show differences between queues, but this might be a pretty useless effort since current code is probably efficient enough with a real underlying appender. > Add optional support for Conversant DisruptorBlockingQueue in AsyncAppender > --------------------------------------------------------------------------- > > Key: LOG4J2-1430 > URL: https://issues.apache.org/jira/browse/LOG4J2-1430 > Project: Log4j 2 > Issue Type: New Feature > Components: Appenders > Affects Versions: 2.6.1 > Reporter: Matt Sicker > Assignee: Matt Sicker > Fix For: 2.7 > > Attachments: AsyncAppenderPerf01.txt, > log4j2-1430-jctools-tmp-patch.txt > > > [Conversant Disruptor|https://github.com/conversant/disruptor] works as an > implementation of BlockingQueue that is much faster than ArrayBlockingQueue. > I did some benchmarks earlier and found it to be a bit faster: > h3. AsyncAppender/ArrayBlockingQueue > {code} > Benchmark Mode Samples > Score Error Units > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput10Params thrpt 20 > 1101267.173 ± 17583.204 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput11Params thrpt 20 > 1128269.255 ± 12188.910 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput1Param thrpt 20 > 1525470.805 ± 56515.933 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput2Params thrpt 20 > 1789434.196 ± 42733.475 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput3Params thrpt 20 > 1803276.278 ± 34938.176 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput4Params thrpt 20 > 1468550.776 ± 26402.286 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput5Params thrpt 20 > 1322304.349 ± 22417.997 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput6Params thrpt 20 > 1179756.489 ± 16502.276 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput7Params thrpt 20 > 1324660.677 ± 18893.944 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput8Params thrpt 20 > 1309365.962 ± 19602.489 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput9Params thrpt 20 > 1422144.180 ± 20815.042 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughputSimple thrpt 20 > 1247862.372 ± 18300.764 ops/s > {code} > h3. AsyncAppender/DisruptorBlockingQueue > {code} > Benchmark Mode Samples > Score Error Units > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput10Params thrpt 20 > 3704735.586 ± 59766.253 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput11Params thrpt 20 > 3622175.410 ± 31975.353 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput1Param thrpt 20 > 6862480.428 ± 121473.276 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput2Params thrpt 20 > 6193288.988 ± 93545.144 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput3Params thrpt 20 > 5715621.712 ± 131878.581 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput4Params thrpt 20 > 5745187.005 ± 213854.016 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput5Params thrpt 20 > 5307137.396 ± 88135.709 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput6Params thrpt 20 > 4953015.419 ± 72100.403 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput7Params thrpt 20 > 4833836.418 ± 52919.314 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput8Params thrpt 20 > 4353791.507 ± 79047.812 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput9Params thrpt 20 > 4136761.624 ± 67804.253 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughputSimple thrpt 20 > 6719456.722 ± 187433.301 ops/s > {code} > h3. AsyncLogger > {code} > Benchmark Mode Samples > Score Error Units > o.a.l.l.p.j.AsyncLoggersBenchmark.throughput10Params thrpt 20 > 5075883.371 ± 180465.316 ops/s > o.a.l.l.p.j.AsyncLoggersBenchmark.throughput11Params thrpt 20 > 4867362.030 ± 193909.465 ops/s > o.a.l.l.p.j.AsyncLoggersBenchmark.throughput1Param thrpt 20 > 10294733.024 ± 226536.965 ops/s > o.a.l.l.p.j.AsyncLoggersBenchmark.throughput2Params thrpt 20 > 9021650.667 ± 351102.255 ops/s > o.a.l.l.p.j.AsyncLoggersBenchmark.throughput3Params thrpt 20 > 8079337.905 ± 115824.975 ops/s > o.a.l.l.p.j.AsyncLoggersBenchmark.throughput4Params thrpt 20 > 7347356.788 ± 66598.738 ops/s > o.a.l.l.p.j.AsyncLoggersBenchmark.throughput5Params thrpt 20 > 6930636.174 ± 150072.908 ops/s > o.a.l.l.p.j.AsyncLoggersBenchmark.throughput6Params thrpt 20 > 6309567.300 ± 293709.787 ops/s > o.a.l.l.p.j.AsyncLoggersBenchmark.throughput7Params thrpt 20 > 6051997.196 ± 268405.087 ops/s > o.a.l.l.p.j.AsyncLoggersBenchmark.throughput8Params thrpt 20 > 5273376.623 ± 99168.461 ops/s > o.a.l.l.p.j.AsyncLoggersBenchmark.throughput9Params thrpt 20 > 5091137.594 ± 150617.444 ops/s > o.a.l.l.p.j.AsyncLoggersBenchmark.throughputSimple thrpt 20 > 11136623.731 ± 400350.272 ops/s > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: log4j-dev-unsubscr...@logging.apache.org For additional commands, e-mail: log4j-dev-h...@logging.apache.org