Re: 8010309 : PlatformLogger: isLoggable performance / waste due to HashMap leads to Integer allocations (boxing)

Peter Levart Mon, 25 Mar 2013 09:33:21 -0700

Well, Laurent, Mandy,

It turns out that the dispatch speed-up (or lack of slow-down to beprecise) is possible without MethodHandles too. Maybe some VM guru couldshed some light on this, but the following is currently the fastest variant:


http://dl.dropbox.com/u/101777488/jdk8-tl/PlatformLogger/webrev.08/index.html

What I did is a simple "if" deciding between two call-sites, making surethat each is only dispatching to single class. This only works, if bothclasses (LoggerProxy and JavaLoggerProxy) are loaded in advance, before1st invocation on any of them is made (might be that using MethodHandlesforced initialization of both classes beforehand and hence thespeed-up). If I don't load JavaLoggerProxy before warming-up withLoggerProxy, then when j.u.logging is enabled, speed drops for a factorof almost 4 and never catches up even after very long time.This pre-loading also helps for a normal single call site dispatch thatdispatches to two distinct classes, but the speed 1st drops when theclass changes, and only catches-up after several billions of iterations(30s tight loop on i7).Now, because JavaLoggerProxy is initialized early, I had to move theinitialization of j.u.logging.Level objects and mappings to a separateclass "JavaLevel".


Here are the benchmark results for this last iteration...

When j.u.logging is enabled from the beginning:

** Patched code (webrev.08)

>> JVM START

>> java.util.logging enabled

#
# isLoggableFinest: run duration:  5,000 ms, #of logical CPUS: 8
#
# Warm up:
           1 threads, Tavg =      1.62 ns/op (? =   0.00 ns/op) [     1.62]
           1 threads, Tavg =      1.64 ns/op (? =   0.00 ns/op) [     1.64]
# Measure:
           1 threads, Tavg =      1.62 ns/op (? =   0.00 ns/op) [     1.62]

2 threads, Tavg = 1.60 ns/op (? = 0.00 ns/op) [1.60, 1.60]3 threads, Tavg = 1.60 ns/op (? = 0.00 ns/op) [1.60, 1.60, 1.60]4 threads, Tavg = 1.61 ns/op (? = 0.01 ns/op) [1.60, 1.62, 1.60, 1.61]


<< JVM END

And when it first runs with j.u.logging disabled and is enabled in themiddle:



>> JVM START
#
# isLoggableFinest: run duration:  5,000 ms, #of logical CPUS: 8
#
# Warm up:
           1 threads, Tavg =      1.33 ns/op (? =   0.00 ns/op) [     1.33]
           1 threads, Tavg =      1.36 ns/op (? =   0.00 ns/op) [     1.36]
# Measure:
           1 threads, Tavg =      1.36 ns/op (? =   0.00 ns/op) [     1.36]

2 threads, Tavg = 1.30 ns/op (? = 0.04 ns/op) [1.34, 1.25]3 threads, Tavg = 1.29 ns/op (? = 0.06 ns/op) [1.37, 1.25, 1.27]4 threads, Tavg = 1.32 ns/op (? = 0.10 ns/op) [1.24, 1.25, 1.31, 1.49]



>> java.util.logging enabled

#
# isLoggableFinest: run duration:  5,000 ms, #of logical CPUS: 8
#
# Warm up:
           1 threads, Tavg =      1.74 ns/op (? =   0.00 ns/op) [     1.74]
           1 threads, Tavg =      1.73 ns/op (? =   0.00 ns/op) [     1.73]
# Measure:
           1 threads, Tavg =      1.57 ns/op (? =   0.00 ns/op) [     1.57]

2 threads, Tavg = 1.57 ns/op (? = 0.00 ns/op) [1.58, 1.57]3 threads, Tavg = 1.58 ns/op (? = 0.00 ns/op) [1.57, 1.58, 1.57]4 threads, Tavg = 1.59 ns/op (? = 0.01 ns/op) [1.61, 1.58, 1.58, 1.59]


<< JVM END

It can be seen that performance is stable as it does not change a bit ifthe "switch" occurs at start or in the middle.

With this patch, we have a 2.9x - 5.7x speed-up compared to originalcode (depending on test).

Laurent, can you try this one on your machine? Would be fine to see alsothe webrev.07 tested on your machine (I'm curious if thisslow-down-due-to-deoptimization also happens on your environment andwhat the impact is).


Regards, Peter


On 03/25/2013 12:46 PM, Peter Levart wrote:

Hi Laurent, Mandy,

MethodHandles.guardWithCheck() rocks!
Out of curiosity, I put the MethodHandles in latest JDK8 build to thetest. It turns out that they are surprisingly good. I optimized thedispatch logic of the most critical method:PlatformLogger.isLoggable(int) which I have shown in previous mail, isperformance-dependent on whether java.util.logging is enabled from thebeginning so that the dispatch is compiled by JIT into an in-linedmonomorphic call or it is enabled later-on when the code has alreadybeen warmed-up using LoggerProxy. The method's performance drops byfactor of 3 in the later situation.
Here's what I did:

http://dl.dropbox.com/u/101777488/jdk8-tl/PlatformLogger/webrev.mh.01/index.html
The benchmark results are as follows (only showing the patchedPlatformLogger - for original code results, refer to previous mail orsee below quoted message):
** Patched code (webrev.mh.01)

>> JVM START
#
# isLoggableFinest: run duration:  5,000 ms, #of logical CPUS: 8
#
# Warm up:
           1 threads, Tavg =      1.40 ns/op (? = 0.00 ns/op) [     1.40]
           1 threads, Tavg =      1.39 ns/op (? = 0.00 ns/op) [     1.39]
# Measure:
           1 threads, Tavg =      1.42 ns/op (? = 0.00 ns/op) [     1.42]
2 threads, Tavg = 1.31 ns/op (? = 0.00 ns/op) [1.31, 1.31]3 threads, Tavg = 1.43 ns/op (? = 0.20 ns/op) [1.74, 1.32, 1.32]4 threads, Tavg = 1.34 ns/op (? = 0.01 ns/op) [1.35, 1.35, 1.33, 1.35]
<< JVM END

>> JVM START

>> java.util.logging enabled

#
# isLoggableFinest: run duration:  5,000 ms, #of logical CPUS: 8
#
# Warm up:
           1 threads, Tavg =      1.69 ns/op (? = 0.00 ns/op) [     1.69]
           1 threads, Tavg =      1.77 ns/op (? = 0.00 ns/op) [     1.77]
# Measure:
           1 threads, Tavg =      1.73 ns/op (? = 0.00 ns/op) [     1.73]
2 threads, Tavg = 1.73 ns/op (? = 0.00 ns/op) [1.72, 1.73]3 threads, Tavg = 1.97 ns/op (? = 0.49 ns/op) [2.74, 1.72, 1.73]4 threads, Tavg = 1.80 ns/op (? = 0.05 ns/op) [1.83, 1.73, 1.79, 1.87]
<< JVM END

>> JVM START
#
# isLoggableFinest: run duration:  5,000 ms, #of logical CPUS: 8
#
# Warm up:
           1 threads, Tavg =      1.35 ns/op (? = 0.00 ns/op) [     1.35]
           1 threads, Tavg =      1.74 ns/op (? = 0.00 ns/op) [     1.74]
# Measure:
           1 threads, Tavg =      1.33 ns/op (? = 0.00 ns/op) [     1.33]
2 threads, Tavg = 1.44 ns/op (? = 0.00 ns/op) [1.44, 1.44]3 threads, Tavg = 1.28 ns/op (? = 0.01 ns/op) [1.30, 1.27, 1.29]4 threads, Tavg = 1.31 ns/op (? = 0.03 ns/op) [1.34, 1.27, 1.29, 1.33]
>> java.util.logging enabled

#
# isLoggableFinest: run duration:  5,000 ms, #of logical CPUS: 8
#
# Warm up:
           1 threads, Tavg =      2.04 ns/op (? = 0.00 ns/op) [     2.04]
           1 threads, Tavg =      2.04 ns/op (? = 0.00 ns/op) [     2.04]
# Measure:
           1 threads, Tavg =      2.04 ns/op (? = 0.00 ns/op) [     2.04]
2 threads, Tavg = 2.04 ns/op (? = 0.00 ns/op) [2.04, 2.04]3 threads, Tavg = 2.04 ns/op (? = 0.00 ns/op) [2.04, 2.03, 2.04]4 threads, Tavg = 2.04 ns/op (? = 0.00 ns/op) [2.04, 2.04, 2.04, 2.05]
<< JVM END
There's still a small drop in speed when "the switch tojava.util.logging" happens, but it's negligible. So compared tooriginal code, with both optimizations combined, the speed-up is 2.6x- 4.5x (depending on test).
I don't know how this performs on JDK7 though, so perhaps a backportshould not use MethodHandles.
Regards, Peter

On 03/22/2013 05:23 PM, Peter Levart wrote:
Hi Laurent,
I have run your performance test on my machine (i7 CPU, Linux) andwith original code on JDK8 64bit (CompressedOOPS enabled by default,no TLAB options), i get:
...
INFO: testPerf[100000000 iterations]: duration = 1767.6329779999999 ms.
INFO: testPerf[100000000 iterations]: duration = 1758.7931449999999 ms.
INFO: testPerf[100000000 iterations]: duration = 1763.386362 ms.
...

While with latest patch (webrev.07), the results are:

...
INFO: testPerf[100000000 iterations]: duration = 660.940466 ms.
INFO: testPerf[100000000 iterations]: duration = 662.4854379999999 ms.
INFO: testPerf[100000000 iterations]: duration = 656.497199 ms.
...


So it's apparently a 2.5x improvement in speed.
This test is designed to test the PlatformLogger.isLoggable() forlevels that don't result in log messages being actually written out,but there are some caveats:
- the test is using a loop with local variables and counters, whichis very prone to JIT optimizations such as loop-unrolling. This mightor might not be happening and even if it is, the impact might be thesame on unpatched vs. patched PlatformLogger code.- the test is testing the performance when the PlatformLogger isdirected to java.util.logging from the beginning. That means JIT iscompiling calls to virtual methods of JavaLoggerProxy intomonomorphic inlined calls. The results are different if the test is1st warmed-up while no java.util.logging is initialized yet (callsdirected to LoggerProxy) and later java.util.logging is enabled(PlatformLoggers are redirected to JavaLoggerProxys) and the sametest repeated.
I have created a special test that demonstrates this (all tests runon recent JDK8 build, i7 (4 cores) Linux, 64bit, no TLAB option,CompressedOOPS enabled by default):
** Original code

>> JVM START
#
# isLoggableFinest: run duration:  5,000 ms, #of logical CPUS: 8
#
# Warm up:
           1 threads, Tavg =      1.78 ns/op (? = 0.00 ns/op) [     1.78]
           1 threads, Tavg =      1.67 ns/op (? = 0.00 ns/op) [     1.67]
# Measure:
           1 threads, Tavg =      1.44 ns/op (? = 0.00 ns/op) [     1.44]
2 threads, Tavg = 1.37 ns/op (? = 0.01 ns/op) [1.37, 1.38]3 threads, Tavg = 1.53 ns/op (? = 0.13 ns/op) [1.41, 1.49, 1.71]4 threads, Tavg = 1.43 ns/op (? = 0.10 ns/op) [1.36, 1.62, 1.39, 1.38]
<< JVM END

>> JVM START

>> java.util.logging enabled

#
# isLoggableFinest: run duration:  5,000 ms, #of logical CPUS: 8
#
# Warm up:
           1 threads, Tavg =      4.81 ns/op (? = 0.00 ns/op) [     4.81]
           1 threads, Tavg =      4.79 ns/op (? = 0.00 ns/op) [     4.79]
# Measure:
           1 threads, Tavg =      4.67 ns/op (? = 0.00 ns/op) [     4.67]
2 threads, Tavg = 4.67 ns/op (? = 0.00 ns/op) [4.67, 4.67]3 threads, Tavg = 4.87 ns/op (? = 0.31 ns/op) [4.67, 4.68, 5.32]4 threads, Tavg = 4.68 ns/op (? = 0.01 ns/op) [4.68, 4.67, 4.68, 4.69]
<< JVM END

>> JVM START
#
# isLoggableFinest: run duration:  5,000 ms, #of logical CPUS: 8
#
# Warm up:
           1 threads, Tavg =      1.39 ns/op (? = 0.00 ns/op) [     1.39]
           1 threads, Tavg =      1.80 ns/op (? = 0.00 ns/op) [     1.80]
# Measure:
           1 threads, Tavg =      1.38 ns/op (? = 0.00 ns/op) [     1.38]
2 threads, Tavg = 1.38 ns/op (? = 0.00 ns/op) [1.38, 1.38]3 threads, Tavg = 1.38 ns/op (? = 0.01 ns/op) [1.39, 1.38, 1.37]4 threads, Tavg = 1.38 ns/op (? = 0.02 ns/op) [1.42, 1.37, 1.37, 1.37]
>> java.util.logging enabled

#
# isLoggableFinest: run duration:  5,000 ms, #of logical CPUS: 8
#
# Warm up:
           1 threads, Tavg =     11.87 ns/op (? = 0.00 ns/op) [    11.87]
           1 threads, Tavg =      9.08 ns/op (? = 0.00 ns/op) [     9.08]
# Measure:
           1 threads, Tavg =      9.12 ns/op (? = 0.00 ns/op) [     9.12]
2 threads, Tavg = 9.02 ns/op (? = 0.02 ns/op) [9.05, 9.00]3 threads, Tavg = 9.20 ns/op (? = 0.04 ns/op) [9.26, 9.19, 9.17]4 threads, Tavg = 9.33 ns/op (? = 0.07 ns/op) [9.44, 9.34, 9.26, 9.28]
<< JVM END


** Patched code (webrev.07)

>> JVM START
#
# isLoggableFinest: run duration:  5,000 ms, #of logical CPUS: 8
#
# Warm up:
           1 threads, Tavg =      1.38 ns/op (? = 0.00 ns/op) [     1.38]
           1 threads, Tavg =      1.68 ns/op (? = 0.00 ns/op) [     1.68]
# Measure:
           1 threads, Tavg =      1.38 ns/op (? = 0.00 ns/op) [     1.38]
2 threads, Tavg = 1.40 ns/op (? = 0.02 ns/op) [1.37, 1.42]3 threads, Tavg = 1.49 ns/op (? = 0.21 ns/op) [1.80, 1.38, 1.36]4 threads, Tavg = 1.50 ns/op (? = 0.13 ns/op) [1.47, 1.73, 1.41, 1.42]
<< JVM END

>> JVM START

>> java.util.logging enabled

#
# isLoggableFinest: run duration:  5,000 ms, #of logical CPUS: 8
#
# Warm up:
           1 threads, Tavg =      1.80 ns/op (? = 0.00 ns/op) [     1.80]
           1 threads, Tavg =      1.74 ns/op (? = 0.00 ns/op) [     1.74]
# Measure:
           1 threads, Tavg =      1.78 ns/op (? = 0.00 ns/op) [     1.78]
2 threads, Tavg = 1.80 ns/op (? = 0.00 ns/op) [1.80, 1.80]3 threads, Tavg = 1.84 ns/op (? = 0.03 ns/op) [1.80, 1.88, 1.86]4 threads, Tavg = 1.78 ns/op (? = 0.00 ns/op) [1.78, 1.77, 1.77, 1.78]
<< JVM END

>> JVM START
#
# isLoggableFinest: run duration:  5,000 ms, #of logical CPUS: 8
#
# Warm up:
           1 threads, Tavg =      1.73 ns/op (? = 0.00 ns/op) [     1.73]
           1 threads, Tavg =      1.72 ns/op (? = 0.00 ns/op) [     1.72]
# Measure:
           1 threads, Tavg =      1.38 ns/op (? = 0.00 ns/op) [     1.38]
2 threads, Tavg = 1.37 ns/op (? = 0.00 ns/op) [1.37, 1.37]3 threads, Tavg = 1.52 ns/op (? = 0.19 ns/op) [1.37, 1.45, 1.79]4 threads, Tavg = 1.58 ns/op (? = 0.19 ns/op) [1.42, 1.79, 1.81, 1.41]
>> java.util.logging enabled

#
# isLoggableFinest: run duration:  5,000 ms, #of logical CPUS: 8
#
# Warm up:
           1 threads, Tavg =      5.43 ns/op (? = 0.00 ns/op) [     5.43]
           1 threads, Tavg =      5.65 ns/op (? = 0.00 ns/op) [     5.65]
# Measure:
           1 threads, Tavg =      5.66 ns/op (? = 0.00 ns/op) [     5.66]
2 threads, Tavg = 5.67 ns/op (? = 0.01 ns/op) [5.68, 5.66]3 threads, Tavg = 5.65 ns/op (? = 0.00 ns/op) [5.65, 5.65, 5.65]4 threads, Tavg = 5.66 ns/op (? = 0.01 ns/op) [5.65, 5.65, 5.67, 5.67]
<< JVM END
It appears the speed-up is visible and varies from 1.6x to 2.6x,depending on other factors that also play role in the overall speed(like JIT optimizations).
I only presented the results for PlatformLogger.isLoggable(FINEST),because this is the most sensitive call in tight loops. Other methodsperform similarly.
This is the test code I used:


package test;

import si.pele.microbench.TestRunner;
import sun.util.logging.PlatformLogger;

import java.util.logging.LogManager;

public class PlatformLoggerBenchmark extends TestRunner {
static final PlatformLogger log =PlatformLogger.getLogger(PlatformLoggerBenchmark.class.getName());
    static {
        log.setLevel(PlatformLogger.SEVERE); // almost OFF
    }

    public static class isLoggableFinest extends Test {
        @Override
protected void doLoop(Loop loop, DevNull devNull1, DevNulldevNull2, DevNull devNull3, DevNull devNull4, DevNull devNull5) {
            while (loop.nextIteration()) {
devNull1.yield(log.isLoggable(PlatformLogger.FINEST));
            }
        }
    }

    public static void main(String[] args) throws Exception {

        System.out.println(">> JVM START");

        doTest(isLoggableFinest.class, 5000L, 1, 4, 1);

        // enable java.util.logging
LogManager.getLogManager().getLogger(log.getName());
        System.out.println("\n>> java.util.logging enabled\n");

        doTest(isLoggableFinest.class, 5000L, 1, 4, 1);

        System.out.println("<< JVM END");
    }
}
The test uses a simple microbenchmark runner that I coded togetherusing some ideas from the sessions with Alexey Shipilev and hisbenchmark suite.
Here's the source to my dirty implementation:

https://github.com/plevart/micro-bench/tree/v2


Regards, Peter

On 03/22/2013 02:28 PM, Peter Levart wrote:
Ok, Lauret, Mandy,

Here are the final touches:

https://dl.dropboxusercontent.com/u/101777488/jdk8-tl/PlatformLogger/webrev.07/index.html

Changes from webrev.06:

- renamed back Level -> LevelEnum
- renamed JavaLogger -> JavaLoggerProxy
- moved javaLevel(int) static method from LevelEnum toJavaLoggerProxy and made it private (only used there)- consistent use of variable name 'level' to refer toPlatformLogger's int level value- consistent use of variable name 'levelEnum' to refer to LevelEnummember- consistent use of variable name 'javaLevel' to refer tojava.util.logging.Level instance- consistent use of variable name 'javaLogger' to refer tojava.util.logging.Logger instance- renamed PlatformLogger.newJavaLogger() private method ->PlatformLogger.redirectToJavaLoggerProxy()- renamed PlatformLogger.logger private field ->PlatformLogger.loggerProxy
- some additional comments
I think that these changes make code more consistent andself-explanatory.
What remains is a possible rename from: javaLogger, javaLevel,JavaLoggerProxy -> julLogger, julLevel, JulLoggerProxy if that's thefinal decision.
Regards, Peter


On 03/22/2013 01:26 PM, Peter Levart wrote:
On 03/22/2013 11:34 AM, Laurent Bourgès wrote:
Peter,

    You've beaten me! I have been preparing them too ;-) ...


Ok I definitely stop working on the code and let you do it.
Ok. I'll make the final touches, incorporating also comments andchanges from your code...
    I also did some some renames, that I think make the code more
    consistent:
    - LevelEnum -> Level (the code is not dependent on
    java.util.logging.Level, so the name can be reused, its
    private anyway)
    - julLevel -> javaLevel (javaLevel / JavaLogger)
    - LevelEnum.forValue -> Level.valueOf (Mandy)
    - JavaLogger.julLevelToEnum -> JavaLogger.javaLevelToLevel
For consistency and clarity, I would prefer having followingconventions:- int levelValue (= PlatformLevel as int) and not int level(conflict with Level enum ...)
I think that PlatformLogger public API should stay as is. Publicmethod's parameter names are just called 'level' and values ofpublic constants are expected to be passed to them. There are onlytwo places where 'level' is the name of a local variable of typeLevel (and not int) and at both places there is not conflict, sincethere's no 'int level' variable in scope.
I renamed LevelEnum to Level because the following most frequentlyused pattern looks strange:
    LevelEnum.javaLevel(int)
Neither parameter nor the result has anything to do with LevelEnumdirectly.
But if we move the javaLevel(int) method out of the Level enum intothe JavaLogger, then Level can be renamed back to LevelEnum (oranything else?).
- julLevel / julLogger: more explicit than javaLevel / javaLogger(java means everything ... but jul means java.util.logging andjavaLogger is in conflict with JavaLogger class)
But javaLogger is not in conflict with javaLevel. I suggestrenaming JavaLoger class to JavaLoggerProxy, so we would have:
Object javaLogger // holding java.util.logging.Logger instance
Object javaLevel    // holding java.util.logging.Level instance
class JavaLoggerProxy // a specialization of LoggerProxy,delegating to javaLogger
If 'jul' is a prefered prefix to 'java', I suggest renaming all 3:julLogger, julLevel, JulLoggerProxy. I don't have a preference foreither, so perhaps Mandy, Laurent or anybody else can express them...
Regards, Peter
    Other changes (to webrev.05):
    - removed the occurrence counts in switch comments (as per
    Mandy's suggestion)
    - made LoggerProxy and JavaLogger private
    - fixed double-read of volatile LoggerProxy.levelValue in
    LoggerProxy.isLoggable()
    - added static Level.javaLevel(int value) shortcut (Mandy)

    I also updated the test to exercise the correctness of mappings.


Well done.

cheers,
Laurent

Re: 8010309 : PlatformLogger: isLoggable performance / waste due to HashMap leads to Integer allocations (boxing)

Reply via email to