This may be of interest: https://www.researchgate.net/publication/333825812_What's_Wrong_With_My_Benchmark_Results_Studying_Bad_Practices_in_JMH_Benchmarks
> On Jul 6, 2019, at 5:49, Raman Gupta <[email protected]> wrote: > > I'm no JMH expert... but, there is a lot more going on there than just > iterations to avoid the extra time for JIT compilation on warmup. > Checkout https://shipilev.net/talks/devoxx-Nov2013-benchmarking.pdf > for an idea of the kinds of problems naive timer-only benchmarks > encounter, and the different ways in which JMH solves them. There are > other good presentations about JMH at https://shipilev.net/ as well. > > Secondly, JMH isn't de-optimizing the JVM (a la client JVM) > willy-nilly. It just provides an API that allows you to measure what > you (hopefully) think you are measuring e.g. blackholes to consume > results that would otherwise be optimized away by the JVM because the > result would otherwise be unused, and so forth. AFAIK, it doesn't > apply these "de-optimizations" on its own -- it just provides an API > for you to use as needed. > > Thirdly, I don't believe JMH does anything (by default) to optimize > away GC or anything like that -- it in fact allows you to consider the > affect of GC by running lots of iterations, so that the affect of GC > *is* taken into account in the benchmark. It does have a (non-default) > option to run a GC before each iteration for specific use cases (e.g. > benchmarks which generate lots of garbage). It also has a GC profiler > to understand the impact of GC on your results (see > https://github.com/Valloric/jmh-playground/blob/master/src/jmh/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java). > > Regards, > Raman > > >> On Fri, Jul 5, 2019 at 4:06 PM Ralph Goers <[email protected]> >> wrote: >> >> Actually, I would disagree. First, I don’t think the first thousand or slow >> iterations running slow would matter much with 1.5 million iterations. Lets >> say the first thousand take 10 times longer the the rest of the iterations. >> That is still only the equivalent of 10,000 iterations. That is a blip in >> 1.5 million. >> Second, measuring the impact of garbage collection should be a part of the >> benchmark if your code is generating garbage. >> >> Again, I am not saying JMH doesn’t have its place, but I have to think >> something weird is going on when JMH is showing the no location benchmark to >> be about 4 times faster than the location benchmark where the simplistic >> loop test shows the elapsed time when including the location information to >> be about twice that of not including. I can tell that the call to >> StackWalker isn’t being optimized away because it still shows up in YourKit >> as the most expensive operation. And when I turn on immediateFlush flushing >> the file shows up in YourKit >> >> Ralph >> >>> On Jul 5, 2019, at 12:18 PM, Matt Sicker <[email protected]> wrote: >>> >>> What I mean is that JMH is written to avoid including >>> behind-the-scenes variability of the code execution that changes from >>> startup until the code has been compiled to native code, garbage >>> collection, synchronization points for threads (safe points), etc. All >>> these things become variable noise that affects benchmarks when the >>> code you're testing executes in an especially short period of time >>> (e.g., appending a log event is pretty fast relatively speaking). >>> >>> Say you wrote a naive benchmark like this: >>> >>> long start = System.nanoTime(); >>> for (int i = 0; i < 1_000_000; i++) { >>> runBenchmark(); >>> } >>> long end = System.nanoTime(); >>> double averageExecutionNanos = (end - start) / 1_000_000.0; >>> >>> This will give incorrect results because the first thousand or so >>> iterations will be run by the first gen compiler, then the next >>> iterations will be run via code output from the second gen compiler >>> which will be significantly faster. So then you might split that up >>> into a warmup block and a measurement block. You'll still be including >>> the execution time spent in garbage collection, thread >>> synchronization, JIT compilation, etc., that isn't relevant to the >>> benchmark for executing the code over time. >>> >>>> On Fri, 5 Jul 2019 at 01:25, Ralph Goers <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>> Well, I can infer it from the examples but it doesn’t come close to >>>> explaining some of the things we use it for. I would argue that logging a >>>> line whose value changes on every call, even if it is just an incremented >>>> integer, that writes to a file is more than a micro-benchmark, and I truly >>>> want to know how it is going to perform in real life, not with all the >>>> optimizations disabled. When I first started Log4j 2 I noticed right away >>>> how much slower it ran when Java ran in client mode, which was much more >>>> common back then. That is essentially what I think JMH is doing with some >>>> of these benchmarks. So while it may be useful in determining that one >>>> algorithm is better than another when the code has not been optimized I >>>> wouldn’t trust that you will see the same behavior when running in a >>>> “real” environment. >>>> >>>> For example, in testing calculating the location I doubt the JIT compiler >>>> can optimize away the call to the native methods involved in grabbing the >>>> stack frames. If that is the case then it can’t optimize away the methods >>>> to walk the stack frames, although it can certainly “clean them up”. >>>> Likewise, it can’t avoid writing log events to files so I don’t see how it >>>> can eliminate the work to call the methods to do it, especially if the log >>>> line isn’t constant. That said, it isn’t clear to me even after looking at >>>> the examples how much JIT stuff JMH is going to allow or prevent. I do see >>>> annotations that can be used to prevent inlining but it isn’t clear to me >>>> how JMH calls the benchmark methods. All >>>> https://www.oracle.com/technetwork/articles/java/architect-benchmarking-2266277.html >>>> >>>> <https://www.oracle.com/technetwork/articles/java/architect-benchmarking-2266277.html> >>>> >>>> <https://www.oracle.com/technetwork/articles/java/architect-benchmarking-2266277.html >>>> >>>> <https://www.oracle.com/technetwork/articles/java/architect-benchmarking-2266277.html>> >>>> says is that @State will prevent “certain” optimizations from occurring. >>>> >>>> Don’t get me wrong. I think there are many places where JHM benchmarks are >>>> going to be useful. I’m just not sold on it being as useful for >>>> “semi-macro” benchmarks like we have in a lot of cases. >>>> >>>> Ralph >>>> >>>>> On Jul 4, 2019, at 9:21 PM, Remko Popma <[email protected]> wrote: >>>>> >>>>> The JMH samples >>>>> (https://hg.openjdk.java.net/code-tools/jmh/file/tip/jmh-samples/src/main/java/org/openjdk/jmh/samples/) >>>>> are the best place to learn about the pitfalls of java micro benchmarks >>>>> and how JMH can help you avoid them. >>>>> >>>>> Remko. >>>>> >>>>> (Shameless plug) Every java main() method deserves http://picocli.info >>>>> >>>>>> On Jul 5, 2019, at 11:25, Ralph Goers <[email protected]> wrote: >>>>>> >>>>>> I don’t understand what you mean with you last message. >>>>>> >>>>>> I don’t really know the details of how JMH works but my understanding is >>>>>> it does something to prevent things like dead code elimination. I >>>>>> haven’t been able to find anything very specific about what JMH actually >>>>>> does though. >>>>>> >>>>>> Ralph >>>>>> >>>>>>> On Jul 4, 2019, at 11:15 AM, Matt Sicker <[email protected]> wrote: >>>>>>> >>>>>>> And the checkpoint bias thing is alleviated amongst other noise from the >>>>>>> JVM that would otherwise invalidate the time measurements. >>>>>>> >>>>>>>> On Thu, Jul 4, 2019 at 13:14, Matt Sicker <[email protected]> wrote: >>>>>>>> >>>>>>>> I thought JMH ran your code enough for JIT to take effect (though >>>>>>>> that’s >>>>>>>> also up to your warmup iteration setting), and the >>>>>>>> >>>>>>>> On Thu, Jul 4, 2019 at 00:21, Ralph Goers <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> That said, here are the results of the >>>>>>>>> FileAppenderWithLocationBenchmark >>>>>>>>> from master. >>>>>>>>> >>>>>>>>> Benchmark Mode Cnt Score >>>>>>>>> Error Units >>>>>>>>> FileAppenderWithLocationBenchmark.log4j1File thrpt 20 27597.034 >>>>>>>>> ± >>>>>>>>> 350.465 ops/s >>>>>>>>> FileAppenderWithLocationBenchmark.log4j2File thrpt 20 82966.450 >>>>>>>>> ± >>>>>>>>> 2023.826 ops/s >>>>>>>>> FileAppenderWithLocationBenchmark.log4j2RAF thrpt 20 78820.010 >>>>>>>>> ± >>>>>>>>> 1452.810 ops/s >>>>>>>>> FileAppenderWithLocationBenchmark.logbackFile thrpt 20 138656.655 >>>>>>>>> ± >>>>>>>>> 6329.118 ops/s >>>>>>>>> >>>>>>>>> and here are the results with all the changes I have made. >>>>>>>>> >>>>>>>>> Benchmark Mode Cnt Score >>>>>>>>> Error Units >>>>>>>>> FileAppenderWithLocationBenchmark.log4j1File thrpt 20 47003.919 >>>>>>>>> ± >>>>>>>>> 1594.591 ops/s >>>>>>>>> FileAppenderWithLocationBenchmark.log4j2File thrpt 20 185129.606 >>>>>>>>> ± >>>>>>>>> 2309.118 ops/s >>>>>>>>> FileAppenderWithLocationBenchmark.log4j2RAF thrpt 20 183346.456 >>>>>>>>> ± >>>>>>>>> 3103.615 ops/s >>>>>>>>> FileAppenderWithLocationBenchmark.logbackFile thrpt 20 130121.612 >>>>>>>>> ± >>>>>>>>> 2535.387 ops/s >>>>>>>>> >>>>>>>>> Ralph >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Jul 3, 2019, at 8:35 PM, Ralph Goers <[email protected]> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Remko, >>>>>>>>>> >>>>>>>>>> I ran the existing JHM benchmarks for both no location and with >>>>>>>>> location and there is still a huge difference. The problem is I don’t >>>>>>>>> necessarily believe the results. Nor do I think the benchmark I have >>>>>>>>> been >>>>>>>>> using is totally accurate either. The problem is that, as I >>>>>>>>> understand it, >>>>>>>>> JMH limits or eliminates much of the optimization the Java runtime >>>>>>>>> will >>>>>>>>> perform. This is bad for Log4j because in some ways our code is >>>>>>>>> written >>>>>>>>> inefficiently and hotspot will optimize it. OTOH, since the test is >>>>>>>>> pretty >>>>>>>>> much logging the same record with an incremented value from the same >>>>>>>>> location it is hard to say whether the code isn’t being over >>>>>>>>> optimized. I >>>>>>>>> suspect the real performance will be somewhere in the middle. What >>>>>>>>> would be >>>>>>>>> better is something like a restful service with plenty of logging >>>>>>>>> driven by >>>>>>>>> Jmeter once with no location and once with it. >>>>>>>>>> >>>>>>>>>> Ralph >>>>>>>>>> >>>>>>>>>>> On Jul 2, 2019, at 3:47 PM, Remko Popma <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> If memory serves correctly, the `includeLocation` was introduced >>>>>>>>> because it was necessary for async loggers (and it seemed not >>>>>>>>> feasible to >>>>>>>>> get this information from the layout pattern). >>>>>>>>>>> >>>>>>>>>>> Async loggers need to know upfront whether they need to pass >>>>>>>>>>> location >>>>>>>>> information to the consumer thread. >>>>>>>>>>> >>>>>>>>>>> If obtaining location information becomes so cheap that it’s >>>>>>>>> essentially free, it is fine to ignore the `includeLocation` >>>>>>>>> directive and >>>>>>>>> include location information anyway. >>>>>>>>>>> >>>>>>>>>>> About the performance results: what’s the difference between “Log4j >>>>>>>>> 3.0 with class and method” and “Log4j 3.0 withLocation()”? >>>>>>>>>>> >>>>>>>>>>> Also I am generally cautious of custom benchmarks; any reason not to >>>>>>>>> use JMH? >>>>>>>>>>> >>>>>>>>>>> Overall impressive results, congratulations! >>>>>>>>>>> >>>>>>>>>>> Remko. >>>>>>>>>>> >>>>>>>>>>> (Shameless plug) Every java main() method deserves >>>>>>>>>>> http://picocli.info >>>>>>>>>>> >>>>>>>>>>>> On Jul 3, 2019, at 7:26, Ralph Goers <[email protected]> >>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> But what do you think about includeLocation. Should it override the >>>>>>>>> result of requiresLocation or should the fact that we determined >>>>>>>>> location >>>>>>>>> information is required override the includeLocation setting? >>>>>>>>>>>> >>>>>>>>>>>> Ralph >>>>>>>>>>>> >>>>>>>>>>>>> On Jul 2, 2019, at 2:34 PM, Matt Sicker <[email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> These benchmarks are impressive! Moving the point of call site >>>>>>>>> calculation is a great idea. >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, 2 Jul 2019 at 16:22, Ralph Goers >>>>>>>>>>>>> <[email protected] >>>>>>>>> <mailto:[email protected]>> wrote: >>>>>>>>>>>>> I have made changes to the code the master branch to measure and >>>>>>>>> improve performance. I have committed some of the changes but not the >>>>>>>>> last >>>>>>>>> set. I have not made the latest changes to release-2.x because they >>>>>>>>> will >>>>>>>>> have to be implemented a bit differently to avoid compatibility >>>>>>>>> issues. The >>>>>>>>> table below summarizes what I am seeing, at least on my MacBook Pro. >>>>>>>>> This >>>>>>>>> test just basically writes 1.5 million log statements in a loop using >>>>>>>>> an >>>>>>>>> equivalent format for each logging framework. Note that JUL is faster >>>>>>>>> with >>>>>>>>> location information than without primarily because a custom Handler >>>>>>>>> had to >>>>>>>>> be created to format the log record and it performs better than the >>>>>>>>> one >>>>>>>>> used to log without location information. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> There are several things I should note about this: >>>>>>>>>>>>> • The first step, which is in the PR, has >>>>>>>>>>>>> StackLocator.calcLocation >>>>>>>>> modified to use a faster algorithm. Streams are not particularly >>>>>>>>> fast. This >>>>>>>>> resulted in some gains not shown here but can be seen in >>>>>>>>> https://issues.apache.org/jira/browse/LOG4J2-2644 < >>>>>>>>> https://issues.apache.org/jira/browse/LOG4J2-2644> >>>>>>>>>>>>> • Performance is proportional to the number of stack traces that >>>>>>>>> need to be traversed. JUL only had 11 whereas Log4j had 35 before it >>>>>>>>> called >>>>>>>>> calcLocation. Our stack is very deep because AbstractLogger and >>>>>>>>> LoggerConfig have been overly refactored along with having to go >>>>>>>>> though >>>>>>>>> several intermediate objects. >>>>>>>>>>>>> • I introduced a requiredLocation() method throughout the stack. >>>>>>>>> Any component that requires location information sets this value to >>>>>>>>> true. >>>>>>>>> Based on this, the call to calcLocation can be made significantly >>>>>>>>> higher up >>>>>>>>> the stack just before the log event is created but only when a >>>>>>>>> configured >>>>>>>>> component will use it - with a few exceptions. This code is not yet >>>>>>>>> checked >>>>>>>>> in. >>>>>>>>>>>>> • If for some reason the location information isn’t calculated >>>>>>>>>>>>> when >>>>>>>>> the event is constructed it can still be lazily populated later using >>>>>>>>> the >>>>>>>>> current, slower mechanism. >>>>>>>>>>>>> >>>>>>>>>>>>> As you can see these changes result in a pretty dramatic >>>>>>>>> improvement, almost to the point that the cost of obtaining location >>>>>>>>> information doesn’t really matter - especially in Java 11. The >>>>>>>>> results in >>>>>>>>> Java 11 amount to 5us per logging call with no location vs 7.5us per >>>>>>>>> logging call with location info. These results could probably be >>>>>>>>> improved >>>>>>>>> even more if AbstractLogger was “flattened” a bit. >>>>>>>>>>>>> >>>>>>>>>>>>> I should point out that the logic isn’t perfect. I cannot >>>>>>>>>>>>> determine >>>>>>>>> whether location information is required or not for the >>>>>>>>> RoutingAppender >>>>>>>>> since it dynamically creates Appenders at run time. Instead, I added a >>>>>>>>> “requiresLocation” attribute that can be set to indicate that. >>>>>>>>>>>>> >>>>>>>>>>>>> However, in testing I have encountered a unit test failure in >>>>>>>>> RandomAccessFileAppenderTest. It is failing because the location >>>>>>>>> information is being included even though includeLocation is set to >>>>>>>>> false. >>>>>>>>> I am wondering if that setting needs to be respected since the pattern >>>>>>>>> specifies including it. It seems odd to me to not include it in that >>>>>>>>> case. >>>>>>>>>>>>> >>>>>>>>>>>>> Thoughts? >>>>>>>>>>>>> >>>>>>>>>>>>> Ralph >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Matt Sicker <[email protected] <mailto:[email protected]>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>> Matt Sicker <[email protected]> >>>>>>>> >>>>>>> -- >>>>>>> Matt Sicker <[email protected]> >>>>>> >>>>>> >>>> >>> >>> >>> -- >>> Matt Sicker <[email protected] <mailto:[email protected]>> >>
