I don’t understand what you mean with you last message. I don’t really know the details of how JMH works but my understanding is it does something to prevent things like dead code elimination. I haven’t been able to find anything very specific about what JMH actually does though.
Ralph > On Jul 4, 2019, at 11:15 AM, Matt Sicker <[email protected]> wrote: > > And the checkpoint bias thing is alleviated amongst other noise from the > JVM that would otherwise invalidate the time measurements. > > On Thu, Jul 4, 2019 at 13:14, Matt Sicker <[email protected]> wrote: > >> I thought JMH ran your code enough for JIT to take effect (though that’s >> also up to your warmup iteration setting), and the >> >> On Thu, Jul 4, 2019 at 00:21, Ralph Goers <[email protected]> >> wrote: >> >>> That said, here are the results of the FileAppenderWithLocationBenchmark >>> from master. >>> >>> Benchmark Mode Cnt Score >>> Error Units >>> FileAppenderWithLocationBenchmark.log4j1File thrpt 20 27597.034 ± >>> 350.465 ops/s >>> FileAppenderWithLocationBenchmark.log4j2File thrpt 20 82966.450 ± >>> 2023.826 ops/s >>> FileAppenderWithLocationBenchmark.log4j2RAF thrpt 20 78820.010 ± >>> 1452.810 ops/s >>> FileAppenderWithLocationBenchmark.logbackFile thrpt 20 138656.655 ± >>> 6329.118 ops/s >>> >>> and here are the results with all the changes I have made. >>> >>> Benchmark Mode Cnt Score >>> Error Units >>> FileAppenderWithLocationBenchmark.log4j1File thrpt 20 47003.919 ± >>> 1594.591 ops/s >>> FileAppenderWithLocationBenchmark.log4j2File thrpt 20 185129.606 ± >>> 2309.118 ops/s >>> FileAppenderWithLocationBenchmark.log4j2RAF thrpt 20 183346.456 ± >>> 3103.615 ops/s >>> FileAppenderWithLocationBenchmark.logbackFile thrpt 20 130121.612 ± >>> 2535.387 ops/s >>> >>> Ralph >>> >>> >>>> On Jul 3, 2019, at 8:35 PM, Ralph Goers <[email protected]> >>> wrote: >>>> >>>> Remko, >>>> >>>> I ran the existing JHM benchmarks for both no location and with >>> location and there is still a huge difference. The problem is I don’t >>> necessarily believe the results. Nor do I think the benchmark I have been >>> using is totally accurate either. The problem is that, as I understand it, >>> JMH limits or eliminates much of the optimization the Java runtime will >>> perform. This is bad for Log4j because in some ways our code is written >>> inefficiently and hotspot will optimize it. OTOH, since the test is pretty >>> much logging the same record with an incremented value from the same >>> location it is hard to say whether the code isn’t being over optimized. I >>> suspect the real performance will be somewhere in the middle. What would be >>> better is something like a restful service with plenty of logging driven by >>> Jmeter once with no location and once with it. >>>> >>>> Ralph >>>> >>>>> On Jul 2, 2019, at 3:47 PM, Remko Popma <[email protected]> wrote: >>>>> >>>>> If memory serves correctly, the `includeLocation` was introduced >>> because it was necessary for async loggers (and it seemed not feasible to >>> get this information from the layout pattern). >>>>> >>>>> Async loggers need to know upfront whether they need to pass location >>> information to the consumer thread. >>>>> >>>>> If obtaining location information becomes so cheap that it’s >>> essentially free, it is fine to ignore the `includeLocation` directive and >>> include location information anyway. >>>>> >>>>> About the performance results: what’s the difference between “Log4j >>> 3.0 with class and method” and “Log4j 3.0 withLocation()”? >>>>> >>>>> Also I am generally cautious of custom benchmarks; any reason not to >>> use JMH? >>>>> >>>>> Overall impressive results, congratulations! >>>>> >>>>> Remko. >>>>> >>>>> (Shameless plug) Every java main() method deserves http://picocli.info >>>>> >>>>>> On Jul 3, 2019, at 7:26, Ralph Goers <[email protected]> >>> wrote: >>>>>> >>>>>> But what do you think about includeLocation. Should it override the >>> result of requiresLocation or should the fact that we determined location >>> information is required override the includeLocation setting? >>>>>> >>>>>> Ralph >>>>>> >>>>>>> On Jul 2, 2019, at 2:34 PM, Matt Sicker <[email protected]> wrote: >>>>>>> >>>>>>> These benchmarks are impressive! Moving the point of call site >>> calculation is a great idea. >>>>>>> >>>>>>> On Tue, 2 Jul 2019 at 16:22, Ralph Goers <[email protected] >>> <mailto:[email protected]>> wrote: >>>>>>> I have made changes to the code the master branch to measure and >>> improve performance. I have committed some of the changes but not the last >>> set. I have not made the latest changes to release-2.x because they will >>> have to be implemented a bit differently to avoid compatibility issues. The >>> table below summarizes what I am seeing, at least on my MacBook Pro. This >>> test just basically writes 1.5 million log statements in a loop using an >>> equivalent format for each logging framework. Note that JUL is faster with >>> location information than without primarily because a custom Handler had to >>> be created to format the log record and it performs better than the one >>> used to log without location information. >>>>>>> >>>>>>> >>>>>>> >>>>>>> There are several things I should note about this: >>>>>>> • The first step, which is in the PR, has StackLocator.calcLocation >>> modified to use a faster algorithm. Streams are not particularly fast. This >>> resulted in some gains not shown here but can be seen in >>> https://issues.apache.org/jira/browse/LOG4J2-2644 < >>> https://issues.apache.org/jira/browse/LOG4J2-2644> >>>>>>> • Performance is proportional to the number of stack traces that >>> need to be traversed. JUL only had 11 whereas Log4j had 35 before it called >>> calcLocation. Our stack is very deep because AbstractLogger and >>> LoggerConfig have been overly refactored along with having to go though >>> several intermediate objects. >>>>>>> • I introduced a requiredLocation() method throughout the stack. >>> Any component that requires location information sets this value to true. >>> Based on this, the call to calcLocation can be made significantly higher up >>> the stack just before the log event is created but only when a configured >>> component will use it - with a few exceptions. This code is not yet checked >>> in. >>>>>>> • If for some reason the location information isn’t calculated when >>> the event is constructed it can still be lazily populated later using the >>> current, slower mechanism. >>>>>>> >>>>>>> As you can see these changes result in a pretty dramatic >>> improvement, almost to the point that the cost of obtaining location >>> information doesn’t really matter - especially in Java 11. The results in >>> Java 11 amount to 5us per logging call with no location vs 7.5us per >>> logging call with location info. These results could probably be improved >>> even more if AbstractLogger was “flattened” a bit. >>>>>>> >>>>>>> I should point out that the logic isn’t perfect. I cannot determine >>> whether location information is required or not for the RoutingAppender >>> since it dynamically creates Appenders at run time. Instead, I added a >>> “requiresLocation” attribute that can be set to indicate that. >>>>>>> >>>>>>> However, in testing I have encountered a unit test failure in >>> RandomAccessFileAppenderTest. It is failing because the location >>> information is being included even though includeLocation is set to false. >>> I am wondering if that setting needs to be respected since the pattern >>> specifies including it. It seems odd to me to not include it in that case. >>>>>>> >>>>>>> Thoughts? >>>>>>> >>>>>>> Ralph >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Matt Sicker <[email protected] <mailto:[email protected]>> >>>>>> >>>>> >>>> >>>> >>>> >>> >>> -- >> Matt Sicker <[email protected]> >> > -- > Matt Sicker <[email protected]>
