Re: Location performance

Ralph Goers Thu, 04 Jul 2019 23:26:35 -0700

Well, I can infer it from the examples but it doesn’t come close to explaining 
some of the things we use it for. I would argue that logging a line whose value 
changes on every call, even if it is just an incremented integer, that writes 
to a file is more than a micro-benchmark, and I truly want to know how it is 
going to perform in real life, not with all the optimizations disabled.  When I 
first started Log4j 2 I noticed right away how much slower it ran when Java ran 
in client mode, which was much more common back then. That is essentially what 
I think JMH is doing with some of these benchmarks. So while it may be useful 
in determining that one algorithm is better than another when the code has not 
been optimized I wouldn’t trust that you will see the same behavior when 
running in a “real” environment.


For example, in testing calculating the location I doubt the JIT compiler can 
optimize away the call to the native methods involved in grabbing the stack 
frames. If that is the case then it can’t optimize away the methods to walk the 
stack frames, although it can certainly “clean them up”. Likewise, it can’t 
avoid writing log events to files so I don’t see how it can eliminate the work 
to call the methods to do it, especially if the log line isn’t constant. That 
said, it isn’t clear to me even after looking at the examples how much JIT 
stuff JMH is going to allow or prevent. I do see annotations that can be used 
to prevent inlining but it isn’t clear to me how JMH calls the benchmark 
methods. All 
https://www.oracle.com/technetwork/articles/java/architect-benchmarking-2266277.html
 
<https://www.oracle.com/technetwork/articles/java/architect-benchmarking-2266277.html>
 says is that @State will prevent “certain” optimizations from occurring.

Don’t get me wrong. I think there are many places where JHM benchmarks are 
going to be useful. I’m just not sold on it being as useful for “semi-macro” 
benchmarks like we have in a lot of cases.

Ralph

> On Jul 4, 2019, at 9:21 PM, Remko Popma <[email protected]> wrote:
> 
> The JMH samples 
> (https://hg.openjdk.java.net/code-tools/jmh/file/tip/jmh-samples/src/main/java/org/openjdk/jmh/samples/)
>  are the best place to learn about the pitfalls of java micro benchmarks and 
> how JMH can help you avoid them. 
> 
> Remko.
> 
> (Shameless plug) Every java main() method deserves http://picocli.info
> 
>> On Jul 5, 2019, at 11:25, Ralph Goers <[email protected]> wrote:
>> 
>> I don’t understand what you mean with you last message.
>> 
>> I don’t really know the details of how JMH works but my understanding is it 
>> does something to prevent things like dead code elimination. I haven’t been 
>> able to find anything very specific about what JMH actually does though.
>> 
>> Ralph
>> 
>>> On Jul 4, 2019, at 11:15 AM, Matt Sicker <[email protected]> wrote:
>>> 
>>> And the checkpoint bias thing is alleviated amongst other noise from the
>>> JVM that would otherwise invalidate the time measurements.
>>> 
>>>> On Thu, Jul 4, 2019 at 13:14, Matt Sicker <[email protected]> wrote:
>>>> 
>>>> I thought JMH ran your code enough for JIT to take effect (though that’s
>>>> also up to your warmup iteration setting), and the
>>>> 
>>>> On Thu, Jul 4, 2019 at 00:21, Ralph Goers <[email protected]>
>>>> wrote:
>>>> 
>>>>> That said, here are the results of the FileAppenderWithLocationBenchmark
>>>>> from master.
>>>>> 
>>>>> Benchmark                                       Mode  Cnt       Score
>>>>> Error  Units
>>>>> FileAppenderWithLocationBenchmark.log4j1File   thrpt   20   27597.034 ±
>>>>> 350.465  ops/s
>>>>> FileAppenderWithLocationBenchmark.log4j2File   thrpt   20   82966.450 ±
>>>>> 2023.826  ops/s
>>>>> FileAppenderWithLocationBenchmark.log4j2RAF    thrpt   20   78820.010 ±
>>>>> 1452.810  ops/s
>>>>> FileAppenderWithLocationBenchmark.logbackFile  thrpt   20  138656.655 ±
>>>>> 6329.118  ops/s
>>>>> 
>>>>> and here are the results with all the changes I have made.
>>>>> 
>>>>> Benchmark                                       Mode  Cnt       Score
>>>>> Error  Units
>>>>> FileAppenderWithLocationBenchmark.log4j1File   thrpt   20   47003.919 ±
>>>>> 1594.591  ops/s
>>>>> FileAppenderWithLocationBenchmark.log4j2File   thrpt   20  185129.606 ±
>>>>> 2309.118  ops/s
>>>>> FileAppenderWithLocationBenchmark.log4j2RAF    thrpt   20  183346.456 ±
>>>>> 3103.615  ops/s
>>>>> FileAppenderWithLocationBenchmark.logbackFile  thrpt   20  130121.612 ±
>>>>> 2535.387  ops/s
>>>>> 
>>>>> Ralph
>>>>> 
>>>>> 
>>>>>> On Jul 3, 2019, at 8:35 PM, Ralph Goers <[email protected]>
>>>>> wrote:
>>>>>> 
>>>>>> Remko,
>>>>>> 
>>>>>> I ran the existing JHM benchmarks for both no location and with
>>>>> location and there is still a huge difference. The problem is I don’t
>>>>> necessarily believe the results. Nor do I think the benchmark I have been
>>>>> using is totally accurate either. The problem is that, as I understand it,
>>>>> JMH limits or eliminates much of the optimization the Java runtime will
>>>>> perform. This is bad for Log4j because in some ways our code is written
>>>>> inefficiently and hotspot will optimize it. OTOH, since the test is pretty
>>>>> much logging the same record with an incremented value from the same
>>>>> location it is hard to say whether the code isn’t being over optimized. I
>>>>> suspect the real performance will be somewhere in the middle. What would 
>>>>> be
>>>>> better is something like a restful service with plenty of logging driven 
>>>>> by
>>>>> Jmeter once with no location and once with it.
>>>>>> 
>>>>>> Ralph
>>>>>> 
>>>>>>> On Jul 2, 2019, at 3:47 PM, Remko Popma <[email protected]> wrote:
>>>>>>> 
>>>>>>> If memory serves correctly, the `includeLocation` was introduced
>>>>> because it was necessary for async loggers (and it seemed not feasible to
>>>>> get this information from the layout pattern).
>>>>>>> 
>>>>>>> Async loggers need to know upfront whether they need to pass location
>>>>> information to the consumer thread.
>>>>>>> 
>>>>>>> If obtaining location information becomes so cheap that it’s
>>>>> essentially free, it is fine to ignore the `includeLocation` directive and
>>>>> include location information anyway.
>>>>>>> 
>>>>>>> About the performance results: what’s the difference between “Log4j
>>>>> 3.0 with class and method” and “Log4j 3.0 withLocation()”?
>>>>>>> 
>>>>>>> Also I am generally cautious of custom benchmarks; any reason not to
>>>>> use JMH?
>>>>>>> 
>>>>>>> Overall impressive results, congratulations!
>>>>>>> 
>>>>>>> Remko.
>>>>>>> 
>>>>>>> (Shameless plug) Every java main() method deserves http://picocli.info
>>>>>>> 
>>>>>>>> On Jul 3, 2019, at 7:26, Ralph Goers <[email protected]>
>>>>> wrote:
>>>>>>>> 
>>>>>>>> But what do you think about includeLocation. Should it override the
>>>>> result of requiresLocation or should the fact that we determined location
>>>>> information is required override the includeLocation setting?
>>>>>>>> 
>>>>>>>> Ralph
>>>>>>>> 
>>>>>>>>> On Jul 2, 2019, at 2:34 PM, Matt Sicker <[email protected]> wrote:
>>>>>>>>> 
>>>>>>>>> These benchmarks are impressive! Moving the point of call site
>>>>> calculation is a great idea.
>>>>>>>>> 
>>>>>>>>> On Tue, 2 Jul 2019 at 16:22, Ralph Goers <[email protected]
>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>> I have made changes to the code the master branch to measure and
>>>>> improve performance. I have committed some of the changes but not the last
>>>>> set. I have not made the latest changes to release-2.x because they will
>>>>> have to be implemented a bit differently to avoid compatibility issues. 
>>>>> The
>>>>> table below summarizes what I am seeing, at least on my MacBook Pro.  This
>>>>> test just basically writes 1.5 million log statements in a loop using an
>>>>> equivalent format for each logging framework. Note that JUL is faster with
>>>>> location information than without primarily because a custom Handler had 
>>>>> to
>>>>> be created to format the log record and it performs better than the one
>>>>> used to log without location information.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> There are several things I should note about this:
>>>>>>>>> • The first step, which is in the PR, has StackLocator.calcLocation
>>>>> modified to use a faster algorithm. Streams are not particularly fast. 
>>>>> This
>>>>> resulted in some gains not shown here but can be seen in
>>>>> https://issues.apache.org/jira/browse/LOG4J2-2644 <
>>>>> https://issues.apache.org/jira/browse/LOG4J2-2644>
>>>>>>>>> • Performance is proportional to the number of stack traces that
>>>>> need to be traversed. JUL only had 11 whereas Log4j had 35 before it 
>>>>> called
>>>>> calcLocation. Our stack is very deep because AbstractLogger and
>>>>> LoggerConfig have been overly refactored along with having to go though
>>>>> several intermediate objects.
>>>>>>>>> • I introduced a requiredLocation() method throughout the stack.
>>>>> Any component that requires location information sets this value to true.
>>>>> Based on this, the call to calcLocation can be made significantly higher 
>>>>> up
>>>>> the stack just before the log event is created but only when a configured
>>>>> component will use it - with a few exceptions. This code is not yet 
>>>>> checked
>>>>> in.
>>>>>>>>> • If for some reason the location information isn’t calculated when
>>>>> the event is constructed it can still be lazily populated later using the
>>>>> current, slower mechanism.
>>>>>>>>> 
>>>>>>>>> As you can see these changes result in a pretty dramatic
>>>>> improvement, almost to the point that the cost of obtaining location
>>>>> information doesn’t really matter - especially in Java 11. The results in
>>>>> Java 11 amount to 5us per logging call with no location vs 7.5us per
>>>>> logging call with location info. These results could probably be improved
>>>>> even more if AbstractLogger was “flattened” a bit.
>>>>>>>>> 
>>>>>>>>> I should point out that the logic isn’t perfect. I cannot determine
>>>>> whether location information is required or not for the RoutingAppender
>>>>> since it dynamically creates Appenders at run time. Instead, I added a
>>>>> “requiresLocation” attribute that can be set to indicate that.
>>>>>>>>> 
>>>>>>>>> However, in testing I have encountered a unit test failure in
>>>>> RandomAccessFileAppenderTest. It is failing because the location
>>>>> information is being included even though includeLocation is set to false.
>>>>> I am wondering if that setting needs to be respected since the pattern
>>>>> specifies including it. It seems odd to me to not include it in that case.
>>>>>>>>> 
>>>>>>>>> Thoughts?
>>>>>>>>> 
>>>>>>>>> Ralph
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Matt Sicker <[email protected] <mailto:[email protected]>>
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> --
>>>> Matt Sicker <[email protected]>
>>>> 
>>> -- 
>>> Matt Sicker <[email protected]>
>> 
>>

Re: Location performance

Reply via email to