Re: Location performance

Raman Gupta Fri, 05 Jul 2019 13:50:08 -0700

I'm no JMH expert... but, there is a lot more going on there than just
iterations to avoid the extra time for JIT compilation on warmup.
Checkout https://shipilev.net/talks/devoxx-Nov2013-benchmarking.pdf
for an idea of the kinds of problems naive timer-only benchmarks
encounter, and the different ways in which JMH solves them. There are
other good presentations about JMH at https://shipilev.net/ as well.


Secondly, JMH isn't de-optimizing the JVM (a la client JVM)
willy-nilly. It just provides an API that allows you to measure what
you (hopefully) think you are measuring e.g. blackholes to consume
results that would otherwise be optimized away by the JVM because the
result would otherwise be unused, and so forth. AFAIK,  it doesn't
apply these "de-optimizations" on its own -- it just provides an API
for you to use as needed.

Thirdly, I don't believe JMH does anything (by default) to optimize
away GC or anything like that -- it in fact allows you to consider the
affect of GC by running lots of iterations, so that the affect of GC
*is* taken into account in the benchmark. It does have a (non-default)
option to run a GC before each iteration for specific use cases (e.g.
benchmarks which generate lots of garbage). It also has a GC profiler
to understand the impact of GC on your results (see
https://github.com/Valloric/jmh-playground/blob/master/src/jmh/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java).

Regards,
Raman


On Fri, Jul 5, 2019 at 4:06 PM Ralph Goers <[email protected]> wrote:
>
> Actually, I would disagree. First, I don’t think the first thousand or slow 
> iterations running slow would matter much with 1.5 million iterations. Lets 
> say the first thousand take 10 times longer the the rest of the iterations. 
> That is still only the equivalent of 10,000 iterations. That is a blip in 1.5 
> million.
> Second, measuring the impact of garbage collection should be a part of the 
> benchmark if your code is generating garbage.
>
> Again, I am not saying JMH doesn’t have its place, but I have to think 
> something weird is going on when JMH is showing the no location benchmark to 
> be about 4 times faster than the location benchmark where the simplistic loop 
> test shows the elapsed time when including the location information to be 
> about twice that of not including. I can tell that the call to StackWalker 
> isn’t being optimized away because it still shows up in YourKit as the most 
> expensive operation. And when I turn on immediateFlush flushing the file 
> shows up in YourKit
>
> Ralph
>
> > On Jul 5, 2019, at 12:18 PM, Matt Sicker <[email protected]> wrote:
> >
> > What I mean is that JMH is written to avoid including
> > behind-the-scenes variability of the code execution that changes from
> > startup until the code has been compiled to native code, garbage
> > collection, synchronization points for threads (safe points), etc. All
> > these things become variable noise that affects benchmarks when the
> > code you're testing executes in an especially short period of time
> > (e.g., appending a log event is pretty fast relatively speaking).
> >
> > Say you wrote a naive benchmark like this:
> >
> > long start = System.nanoTime();
> > for (int i = 0; i < 1_000_000; i++) {
> >  runBenchmark();
> > }
> > long end = System.nanoTime();
> > double averageExecutionNanos = (end - start) / 1_000_000.0;
> >
> > This will give incorrect results because the first thousand or so
> > iterations will be run by the first gen compiler, then the next
> > iterations will be run via code output from the second gen compiler
> > which will be significantly faster. So then you might split that up
> > into a warmup block and a measurement block. You'll still be including
> > the execution time spent in garbage collection, thread
> > synchronization, JIT compilation, etc., that isn't relevant to the
> > benchmark for executing the code over time.
> >
> > On Fri, 5 Jul 2019 at 01:25, Ralph Goers <[email protected] 
> > <mailto:[email protected]>> wrote:
> >>
> >> Well, I can infer it from the examples but it doesn’t come close to 
> >> explaining some of the things we use it for. I would argue that logging a 
> >> line whose value changes on every call, even if it is just an incremented 
> >> integer, that writes to a file is more than a micro-benchmark, and I truly 
> >> want to know how it is going to perform in real life, not with all the 
> >> optimizations disabled.  When I first started Log4j 2 I noticed right away 
> >> how much slower it ran when Java ran in client mode, which was much more 
> >> common back then. That is essentially what I think JMH is doing with some 
> >> of these benchmarks. So while it may be useful in determining that one 
> >> algorithm is better than another when the code has not been optimized I 
> >> wouldn’t trust that you will see the same behavior when running in a 
> >> “real” environment.
> >>
> >> For example, in testing calculating the location I doubt the JIT compiler 
> >> can optimize away the call to the native methods involved in grabbing the 
> >> stack frames. If that is the case then it can’t optimize away the methods 
> >> to walk the stack frames, although it can certainly “clean them up”. 
> >> Likewise, it can’t avoid writing log events to files so I don’t see how it 
> >> can eliminate the work to call the methods to do it, especially if the log 
> >> line isn’t constant. That said, it isn’t clear to me even after looking at 
> >> the examples how much JIT stuff JMH is going to allow or prevent. I do see 
> >> annotations that can be used to prevent inlining but it isn’t clear to me 
> >> how JMH calls the benchmark methods. All 
> >> https://www.oracle.com/technetwork/articles/java/architect-benchmarking-2266277.html
> >>  
> >> <https://www.oracle.com/technetwork/articles/java/architect-benchmarking-2266277.html>
> >>  
> >> <https://www.oracle.com/technetwork/articles/java/architect-benchmarking-2266277.html
> >>  
> >> <https://www.oracle.com/technetwork/articles/java/architect-benchmarking-2266277.html>>
> >>  says is that @State will prevent “certain” optimizations from occurring.
> >>
> >> Don’t get me wrong. I think there are many places where JHM benchmarks are 
> >> going to be useful. I’m just not sold on it being as useful for 
> >> “semi-macro” benchmarks like we have in a lot of cases.
> >>
> >> Ralph
> >>
> >>> On Jul 4, 2019, at 9:21 PM, Remko Popma <[email protected]> wrote:
> >>>
> >>> The JMH samples 
> >>> (https://hg.openjdk.java.net/code-tools/jmh/file/tip/jmh-samples/src/main/java/org/openjdk/jmh/samples/)
> >>>  are the best place to learn about the pitfalls of java micro benchmarks 
> >>> and how JMH can help you avoid them.
> >>>
> >>> Remko.
> >>>
> >>> (Shameless plug) Every java main() method deserves http://picocli.info
> >>>
> >>>> On Jul 5, 2019, at 11:25, Ralph Goers <[email protected]> wrote:
> >>>>
> >>>> I don’t understand what you mean with you last message.
> >>>>
> >>>> I don’t really know the details of how JMH works but my understanding is 
> >>>> it does something to prevent things like dead code elimination. I 
> >>>> haven’t been able to find anything very specific about what JMH actually 
> >>>> does though.
> >>>>
> >>>> Ralph
> >>>>
> >>>>> On Jul 4, 2019, at 11:15 AM, Matt Sicker <[email protected]> wrote:
> >>>>>
> >>>>> And the checkpoint bias thing is alleviated amongst other noise from the
> >>>>> JVM that would otherwise invalidate the time measurements.
> >>>>>
> >>>>>> On Thu, Jul 4, 2019 at 13:14, Matt Sicker <[email protected]> wrote:
> >>>>>>
> >>>>>> I thought JMH ran your code enough for JIT to take effect (though 
> >>>>>> that’s
> >>>>>> also up to your warmup iteration setting), and the
> >>>>>>
> >>>>>> On Thu, Jul 4, 2019 at 00:21, Ralph Goers <[email protected]>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> That said, here are the results of the 
> >>>>>>> FileAppenderWithLocationBenchmark
> >>>>>>> from master.
> >>>>>>>
> >>>>>>> Benchmark                                       Mode  Cnt       Score
> >>>>>>> Error  Units
> >>>>>>> FileAppenderWithLocationBenchmark.log4j1File   thrpt   20   27597.034 
> >>>>>>> ±
> >>>>>>> 350.465  ops/s
> >>>>>>> FileAppenderWithLocationBenchmark.log4j2File   thrpt   20   82966.450 
> >>>>>>> ±
> >>>>>>> 2023.826  ops/s
> >>>>>>> FileAppenderWithLocationBenchmark.log4j2RAF    thrpt   20   78820.010 
> >>>>>>> ±
> >>>>>>> 1452.810  ops/s
> >>>>>>> FileAppenderWithLocationBenchmark.logbackFile  thrpt   20  138656.655 
> >>>>>>> ±
> >>>>>>> 6329.118  ops/s
> >>>>>>>
> >>>>>>> and here are the results with all the changes I have made.
> >>>>>>>
> >>>>>>> Benchmark                                       Mode  Cnt       Score
> >>>>>>> Error  Units
> >>>>>>> FileAppenderWithLocationBenchmark.log4j1File   thrpt   20   47003.919 
> >>>>>>> ±
> >>>>>>> 1594.591  ops/s
> >>>>>>> FileAppenderWithLocationBenchmark.log4j2File   thrpt   20  185129.606 
> >>>>>>> ±
> >>>>>>> 2309.118  ops/s
> >>>>>>> FileAppenderWithLocationBenchmark.log4j2RAF    thrpt   20  183346.456 
> >>>>>>> ±
> >>>>>>> 3103.615  ops/s
> >>>>>>> FileAppenderWithLocationBenchmark.logbackFile  thrpt   20  130121.612 
> >>>>>>> ±
> >>>>>>> 2535.387  ops/s
> >>>>>>>
> >>>>>>> Ralph
> >>>>>>>
> >>>>>>>
> >>>>>>>> On Jul 3, 2019, at 8:35 PM, Ralph Goers <[email protected]>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> Remko,
> >>>>>>>>
> >>>>>>>> I ran the existing JHM benchmarks for both no location and with
> >>>>>>> location and there is still a huge difference. The problem is I don’t
> >>>>>>> necessarily believe the results. Nor do I think the benchmark I have 
> >>>>>>> been
> >>>>>>> using is totally accurate either. The problem is that, as I 
> >>>>>>> understand it,
> >>>>>>> JMH limits or eliminates much of the optimization the Java runtime 
> >>>>>>> will
> >>>>>>> perform. This is bad for Log4j because in some ways our code is 
> >>>>>>> written
> >>>>>>> inefficiently and hotspot will optimize it. OTOH, since the test is 
> >>>>>>> pretty
> >>>>>>> much logging the same record with an incremented value from the same
> >>>>>>> location it is hard to say whether the code isn’t being over 
> >>>>>>> optimized. I
> >>>>>>> suspect the real performance will be somewhere in the middle. What 
> >>>>>>> would be
> >>>>>>> better is something like a restful service with plenty of logging 
> >>>>>>> driven by
> >>>>>>> Jmeter once with no location and once with it.
> >>>>>>>>
> >>>>>>>> Ralph
> >>>>>>>>
> >>>>>>>>> On Jul 2, 2019, at 3:47 PM, Remko Popma <[email protected]> 
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> If memory serves correctly, the `includeLocation` was introduced
> >>>>>>> because it was necessary for async loggers (and it seemed not 
> >>>>>>> feasible to
> >>>>>>> get this information from the layout pattern).
> >>>>>>>>>
> >>>>>>>>> Async loggers need to know upfront whether they need to pass 
> >>>>>>>>> location
> >>>>>>> information to the consumer thread.
> >>>>>>>>>
> >>>>>>>>> If obtaining location information becomes so cheap that it’s
> >>>>>>> essentially free, it is fine to ignore the `includeLocation` 
> >>>>>>> directive and
> >>>>>>> include location information anyway.
> >>>>>>>>>
> >>>>>>>>> About the performance results: what’s the difference between “Log4j
> >>>>>>> 3.0 with class and method” and “Log4j 3.0 withLocation()”?
> >>>>>>>>>
> >>>>>>>>> Also I am generally cautious of custom benchmarks; any reason not to
> >>>>>>> use JMH?
> >>>>>>>>>
> >>>>>>>>> Overall impressive results, congratulations!
> >>>>>>>>>
> >>>>>>>>> Remko.
> >>>>>>>>>
> >>>>>>>>> (Shameless plug) Every java main() method deserves 
> >>>>>>>>> http://picocli.info
> >>>>>>>>>
> >>>>>>>>>> On Jul 3, 2019, at 7:26, Ralph Goers <[email protected]>
> >>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> But what do you think about includeLocation. Should it override the
> >>>>>>> result of requiresLocation or should the fact that we determined 
> >>>>>>> location
> >>>>>>> information is required override the includeLocation setting?
> >>>>>>>>>>
> >>>>>>>>>> Ralph
> >>>>>>>>>>
> >>>>>>>>>>> On Jul 2, 2019, at 2:34 PM, Matt Sicker <[email protected]> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> These benchmarks are impressive! Moving the point of call site
> >>>>>>> calculation is a great idea.
> >>>>>>>>>>>
> >>>>>>>>>>> On Tue, 2 Jul 2019 at 16:22, Ralph Goers 
> >>>>>>>>>>> <[email protected]
> >>>>>>> <mailto:[email protected]>> wrote:
> >>>>>>>>>>> I have made changes to the code the master branch to measure and
> >>>>>>> improve performance. I have committed some of the changes but not the 
> >>>>>>> last
> >>>>>>> set. I have not made the latest changes to release-2.x because they 
> >>>>>>> will
> >>>>>>> have to be implemented a bit differently to avoid compatibility 
> >>>>>>> issues. The
> >>>>>>> table below summarizes what I am seeing, at least on my MacBook Pro.  
> >>>>>>> This
> >>>>>>> test just basically writes 1.5 million log statements in a loop using 
> >>>>>>> an
> >>>>>>> equivalent format for each logging framework. Note that JUL is faster 
> >>>>>>> with
> >>>>>>> location information than without primarily because a custom Handler 
> >>>>>>> had to
> >>>>>>> be created to format the log record and it performs better than the 
> >>>>>>> one
> >>>>>>> used to log without location information.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> There are several things I should note about this:
> >>>>>>>>>>> • The first step, which is in the PR, has 
> >>>>>>>>>>> StackLocator.calcLocation
> >>>>>>> modified to use a faster algorithm. Streams are not particularly 
> >>>>>>> fast. This
> >>>>>>> resulted in some gains not shown here but can be seen in
> >>>>>>> https://issues.apache.org/jira/browse/LOG4J2-2644 <
> >>>>>>> https://issues.apache.org/jira/browse/LOG4J2-2644>
> >>>>>>>>>>> • Performance is proportional to the number of stack traces that
> >>>>>>> need to be traversed. JUL only had 11 whereas Log4j had 35 before it 
> >>>>>>> called
> >>>>>>> calcLocation. Our stack is very deep because AbstractLogger and
> >>>>>>> LoggerConfig have been overly refactored along with having to go 
> >>>>>>> though
> >>>>>>> several intermediate objects.
> >>>>>>>>>>> • I introduced a requiredLocation() method throughout the stack.
> >>>>>>> Any component that requires location information sets this value to 
> >>>>>>> true.
> >>>>>>> Based on this, the call to calcLocation can be made significantly 
> >>>>>>> higher up
> >>>>>>> the stack just before the log event is created but only when a 
> >>>>>>> configured
> >>>>>>> component will use it - with a few exceptions. This code is not yet 
> >>>>>>> checked
> >>>>>>> in.
> >>>>>>>>>>> • If for some reason the location information isn’t calculated 
> >>>>>>>>>>> when
> >>>>>>> the event is constructed it can still be lazily populated later using 
> >>>>>>> the
> >>>>>>> current, slower mechanism.
> >>>>>>>>>>>
> >>>>>>>>>>> As you can see these changes result in a pretty dramatic
> >>>>>>> improvement, almost to the point that the cost of obtaining location
> >>>>>>> information doesn’t really matter - especially in Java 11. The 
> >>>>>>> results in
> >>>>>>> Java 11 amount to 5us per logging call with no location vs 7.5us per
> >>>>>>> logging call with location info. These results could probably be 
> >>>>>>> improved
> >>>>>>> even more if AbstractLogger was “flattened” a bit.
> >>>>>>>>>>>
> >>>>>>>>>>> I should point out that the logic isn’t perfect. I cannot 
> >>>>>>>>>>> determine
> >>>>>>> whether location information is required or not for the 
> >>>>>>> RoutingAppender
> >>>>>>> since it dynamically creates Appenders at run time. Instead, I added a
> >>>>>>> “requiresLocation” attribute that can be set to indicate that.
> >>>>>>>>>>>
> >>>>>>>>>>> However, in testing I have encountered a unit test failure in
> >>>>>>> RandomAccessFileAppenderTest. It is failing because the location
> >>>>>>> information is being included even though includeLocation is set to 
> >>>>>>> false.
> >>>>>>> I am wondering if that setting needs to be respected since the pattern
> >>>>>>> specifies including it. It seems odd to me to not include it in that 
> >>>>>>> case.
> >>>>>>>>>>>
> >>>>>>>>>>> Thoughts?
> >>>>>>>>>>>
> >>>>>>>>>>> Ralph
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>> Matt Sicker <[email protected] <mailto:[email protected]>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>> Matt Sicker <[email protected]>
> >>>>>>
> >>>>> --
> >>>>> Matt Sicker <[email protected]>
> >>>>
> >>>>
> >>
> >
> >
> > --
> > Matt Sicker <[email protected] <mailto:[email protected]>>
>

Re: Location performance

Reply via email to