Well, Laurent, Mandy,

It turns out that the dispatch speed-up (or lack of slow-down to be precise) is possible without MethodHandles too. Maybe some VM guru could shed some light on this, but the following is currently the fastest variant:

http://dl.dropbox.com/u/101777488/jdk8-tl/PlatformLogger/webrev.08/index.html

What I did is a simple "if" deciding between two call-sites, making sure that each is only dispatching to single class. This only works, if both classes (LoggerProxy and JavaLoggerProxy) are loaded in advance, before 1st invocation on any of them is made (might be that using MethodHandles forced initialization of both classes beforehand and hence the speed-up). If I don't load JavaLoggerProxy before warming-up with LoggerProxy, then when j.u.logging is enabled, speed drops for a factor of almost 4 and never catches up even after very long time. This pre-loading also helps for a normal single call site dispatch that dispatches to two distinct classes, but the speed 1st drops when the class changes, and only catches-up after several billions of iterations (30s tight loop on i7). Now, because JavaLoggerProxy is initialized early, I had to move the initialization of j.u.logging.Level objects and mappings to a separate class "JavaLevel".

Here are the benchmark results for this last iteration...

When j.u.logging is enabled from the beginning:

** Patched code (webrev.08)

>> JVM START

>> java.util.logging enabled

#
# isLoggableFinest: run duration:  5,000 ms, #of logical CPUS: 8
#
# Warm up:
           1 threads, Tavg =      1.62 ns/op (? =   0.00 ns/op) [     1.62]
           1 threads, Tavg =      1.64 ns/op (? =   0.00 ns/op) [     1.64]
# Measure:
           1 threads, Tavg =      1.62 ns/op (? =   0.00 ns/op) [     1.62]
2 threads, Tavg = 1.60 ns/op (? = 0.00 ns/op) [ 1.60, 1.60] 3 threads, Tavg = 1.60 ns/op (? = 0.00 ns/op) [ 1.60, 1.60, 1.60] 4 threads, Tavg = 1.61 ns/op (? = 0.01 ns/op) [ 1.60, 1.62, 1.60, 1.61]

<< JVM END


And when it first runs with j.u.logging disabled and is enabled in the middle:


>> JVM START
#
# isLoggableFinest: run duration:  5,000 ms, #of logical CPUS: 8
#
# Warm up:
           1 threads, Tavg =      1.33 ns/op (? =   0.00 ns/op) [     1.33]
           1 threads, Tavg =      1.36 ns/op (? =   0.00 ns/op) [     1.36]
# Measure:
           1 threads, Tavg =      1.36 ns/op (? =   0.00 ns/op) [     1.36]
2 threads, Tavg = 1.30 ns/op (? = 0.04 ns/op) [ 1.34, 1.25] 3 threads, Tavg = 1.29 ns/op (? = 0.06 ns/op) [ 1.37, 1.25, 1.27] 4 threads, Tavg = 1.32 ns/op (? = 0.10 ns/op) [ 1.24, 1.25, 1.31, 1.49]


>> java.util.logging enabled

#
# isLoggableFinest: run duration:  5,000 ms, #of logical CPUS: 8
#
# Warm up:
           1 threads, Tavg =      1.74 ns/op (? =   0.00 ns/op) [     1.74]
           1 threads, Tavg =      1.73 ns/op (? =   0.00 ns/op) [     1.73]
# Measure:
           1 threads, Tavg =      1.57 ns/op (? =   0.00 ns/op) [     1.57]
2 threads, Tavg = 1.57 ns/op (? = 0.00 ns/op) [ 1.58, 1.57] 3 threads, Tavg = 1.58 ns/op (? = 0.00 ns/op) [ 1.57, 1.58, 1.57] 4 threads, Tavg = 1.59 ns/op (? = 0.01 ns/op) [ 1.61, 1.58, 1.58, 1.59]

<< JVM END


It can be seen that performance is stable as it does not change a bit if the "switch" occurs at start or in the middle.

With this patch, we have a 2.9x - 5.7x speed-up compared to original code (depending on test).

Laurent, can you try this one on your machine? Would be fine to see also the webrev.07 tested on your machine (I'm curious if this slow-down-due-to-deoptimization also happens on your environment and what the impact is).

Regards, Peter


On 03/25/2013 12:46 PM, Peter Levart wrote:
Hi Laurent, Mandy,

MethodHandles.guardWithCheck() rocks!

Out of curiosity, I put the MethodHandles in latest JDK8 build to the test. It turns out that they are surprisingly good. I optimized the dispatch logic of the most critical method: PlatformLogger.isLoggable(int) which I have shown in previous mail, is performance-dependent on whether java.util.logging is enabled from the beginning so that the dispatch is compiled by JIT into an in-lined monomorphic call or it is enabled later-on when the code has already been warmed-up using LoggerProxy. The method's performance drops by factor of 3 in the later situation.

Here's what I did:

http://dl.dropbox.com/u/101777488/jdk8-tl/PlatformLogger/webrev.mh.01/index.html

The benchmark results are as follows (only showing the patched PlatformLogger - for original code results, refer to previous mail or see below quoted message):

** Patched code (webrev.mh.01)

>> JVM START
#
# isLoggableFinest: run duration:  5,000 ms, #of logical CPUS: 8
#
# Warm up:
           1 threads, Tavg =      1.40 ns/op (? = 0.00 ns/op) [     1.40]
           1 threads, Tavg =      1.39 ns/op (? = 0.00 ns/op) [     1.39]
# Measure:
           1 threads, Tavg =      1.42 ns/op (? = 0.00 ns/op) [     1.42]
2 threads, Tavg = 1.31 ns/op (? = 0.00 ns/op) [ 1.31, 1.31] 3 threads, Tavg = 1.43 ns/op (? = 0.20 ns/op) [ 1.74, 1.32, 1.32] 4 threads, Tavg = 1.34 ns/op (? = 0.01 ns/op) [ 1.35, 1.35, 1.33, 1.35]

<< JVM END

>> JVM START

>> java.util.logging enabled

#
# isLoggableFinest: run duration:  5,000 ms, #of logical CPUS: 8
#
# Warm up:
           1 threads, Tavg =      1.69 ns/op (? = 0.00 ns/op) [     1.69]
           1 threads, Tavg =      1.77 ns/op (? = 0.00 ns/op) [     1.77]
# Measure:
           1 threads, Tavg =      1.73 ns/op (? = 0.00 ns/op) [     1.73]
2 threads, Tavg = 1.73 ns/op (? = 0.00 ns/op) [ 1.72, 1.73] 3 threads, Tavg = 1.97 ns/op (? = 0.49 ns/op) [ 2.74, 1.72, 1.73] 4 threads, Tavg = 1.80 ns/op (? = 0.05 ns/op) [ 1.83, 1.73, 1.79, 1.87]

<< JVM END

>> JVM START
#
# isLoggableFinest: run duration:  5,000 ms, #of logical CPUS: 8
#
# Warm up:
           1 threads, Tavg =      1.35 ns/op (? = 0.00 ns/op) [     1.35]
           1 threads, Tavg =      1.74 ns/op (? = 0.00 ns/op) [     1.74]
# Measure:
           1 threads, Tavg =      1.33 ns/op (? = 0.00 ns/op) [     1.33]
2 threads, Tavg = 1.44 ns/op (? = 0.00 ns/op) [ 1.44, 1.44] 3 threads, Tavg = 1.28 ns/op (? = 0.01 ns/op) [ 1.30, 1.27, 1.29] 4 threads, Tavg = 1.31 ns/op (? = 0.03 ns/op) [ 1.34, 1.27, 1.29, 1.33]


>> java.util.logging enabled

#
# isLoggableFinest: run duration:  5,000 ms, #of logical CPUS: 8
#
# Warm up:
           1 threads, Tavg =      2.04 ns/op (? = 0.00 ns/op) [     2.04]
           1 threads, Tavg =      2.04 ns/op (? = 0.00 ns/op) [     2.04]
# Measure:
           1 threads, Tavg =      2.04 ns/op (? = 0.00 ns/op) [     2.04]
2 threads, Tavg = 2.04 ns/op (? = 0.00 ns/op) [ 2.04, 2.04] 3 threads, Tavg = 2.04 ns/op (? = 0.00 ns/op) [ 2.04, 2.03, 2.04] 4 threads, Tavg = 2.04 ns/op (? = 0.00 ns/op) [ 2.04, 2.04, 2.04, 2.05]

<< JVM END


There's still a small drop in speed when "the switch to java.util.logging" happens, but it's negligible. So compared to original code, with both optimizations combined, the speed-up is 2.6x - 4.5x (depending on test).

I don't know how this performs on JDK7 though, so perhaps a backport should not use MethodHandles.

Regards, Peter

On 03/22/2013 05:23 PM, Peter Levart wrote:
Hi Laurent,

I have run your performance test on my machine (i7 CPU, Linux) and with original code on JDK8 64bit (CompressedOOPS enabled by default, no TLAB options), i get:

...
INFO: testPerf[100000000 iterations]: duration = 1767.6329779999999 ms.
INFO: testPerf[100000000 iterations]: duration = 1758.7931449999999 ms.
INFO: testPerf[100000000 iterations]: duration = 1763.386362 ms.
...

While with latest patch (webrev.07), the results are:

...
INFO: testPerf[100000000 iterations]: duration = 660.940466 ms.
INFO: testPerf[100000000 iterations]: duration = 662.4854379999999 ms.
INFO: testPerf[100000000 iterations]: duration = 656.497199 ms.
...


So it's apparently a 2.5x improvement in speed.

This test is designed to test the PlatformLogger.isLoggable() for levels that don't result in log messages being actually written out, but there are some caveats:

- the test is using a loop with local variables and counters, which is very prone to JIT optimizations such as loop-unrolling. This might or might not be happening and even if it is, the impact might be the same on unpatched vs. patched PlatformLogger code. - the test is testing the performance when the PlatformLogger is directed to java.util.logging from the beginning. That means JIT is compiling calls to virtual methods of JavaLoggerProxy into monomorphic inlined calls. The results are different if the test is 1st warmed-up while no java.util.logging is initialized yet (calls directed to LoggerProxy) and later java.util.logging is enabled (PlatformLoggers are redirected to JavaLoggerProxys) and the same test repeated.

I have created a special test that demonstrates this (all tests run on recent JDK8 build, i7 (4 cores) Linux, 64bit, no TLAB option, CompressedOOPS enabled by default):

** Original code

>> JVM START
#
# isLoggableFinest: run duration:  5,000 ms, #of logical CPUS: 8
#
# Warm up:
           1 threads, Tavg =      1.78 ns/op (? = 0.00 ns/op) [     1.78]
           1 threads, Tavg =      1.67 ns/op (? = 0.00 ns/op) [     1.67]
# Measure:
           1 threads, Tavg =      1.44 ns/op (? = 0.00 ns/op) [     1.44]
2 threads, Tavg = 1.37 ns/op (? = 0.01 ns/op) [ 1.37, 1.38] 3 threads, Tavg = 1.53 ns/op (? = 0.13 ns/op) [ 1.41, 1.49, 1.71] 4 threads, Tavg = 1.43 ns/op (? = 0.10 ns/op) [ 1.36, 1.62, 1.39, 1.38]

<< JVM END

>> JVM START

>> java.util.logging enabled

#
# isLoggableFinest: run duration:  5,000 ms, #of logical CPUS: 8
#
# Warm up:
           1 threads, Tavg =      4.81 ns/op (? = 0.00 ns/op) [     4.81]
           1 threads, Tavg =      4.79 ns/op (? = 0.00 ns/op) [     4.79]
# Measure:
           1 threads, Tavg =      4.67 ns/op (? = 0.00 ns/op) [     4.67]
2 threads, Tavg = 4.67 ns/op (? = 0.00 ns/op) [ 4.67, 4.67] 3 threads, Tavg = 4.87 ns/op (? = 0.31 ns/op) [ 4.67, 4.68, 5.32] 4 threads, Tavg = 4.68 ns/op (? = 0.01 ns/op) [ 4.68, 4.67, 4.68, 4.69]

<< JVM END

>> JVM START
#
# isLoggableFinest: run duration:  5,000 ms, #of logical CPUS: 8
#
# Warm up:
           1 threads, Tavg =      1.39 ns/op (? = 0.00 ns/op) [     1.39]
           1 threads, Tavg =      1.80 ns/op (? = 0.00 ns/op) [     1.80]
# Measure:
           1 threads, Tavg =      1.38 ns/op (? = 0.00 ns/op) [     1.38]
2 threads, Tavg = 1.38 ns/op (? = 0.00 ns/op) [ 1.38, 1.38] 3 threads, Tavg = 1.38 ns/op (? = 0.01 ns/op) [ 1.39, 1.38, 1.37] 4 threads, Tavg = 1.38 ns/op (? = 0.02 ns/op) [ 1.42, 1.37, 1.37, 1.37]


>> java.util.logging enabled

#
# isLoggableFinest: run duration:  5,000 ms, #of logical CPUS: 8
#
# Warm up:
           1 threads, Tavg =     11.87 ns/op (? = 0.00 ns/op) [    11.87]
           1 threads, Tavg =      9.08 ns/op (? = 0.00 ns/op) [     9.08]
# Measure:
           1 threads, Tavg =      9.12 ns/op (? = 0.00 ns/op) [     9.12]
2 threads, Tavg = 9.02 ns/op (? = 0.02 ns/op) [ 9.05, 9.00] 3 threads, Tavg = 9.20 ns/op (? = 0.04 ns/op) [ 9.26, 9.19, 9.17] 4 threads, Tavg = 9.33 ns/op (? = 0.07 ns/op) [ 9.44, 9.34, 9.26, 9.28]

<< JVM END


** Patched code (webrev.07)

>> JVM START
#
# isLoggableFinest: run duration:  5,000 ms, #of logical CPUS: 8
#
# Warm up:
           1 threads, Tavg =      1.38 ns/op (? = 0.00 ns/op) [     1.38]
           1 threads, Tavg =      1.68 ns/op (? = 0.00 ns/op) [     1.68]
# Measure:
           1 threads, Tavg =      1.38 ns/op (? = 0.00 ns/op) [     1.38]
2 threads, Tavg = 1.40 ns/op (? = 0.02 ns/op) [ 1.37, 1.42] 3 threads, Tavg = 1.49 ns/op (? = 0.21 ns/op) [ 1.80, 1.38, 1.36] 4 threads, Tavg = 1.50 ns/op (? = 0.13 ns/op) [ 1.47, 1.73, 1.41, 1.42]

<< JVM END

>> JVM START

>> java.util.logging enabled

#
# isLoggableFinest: run duration:  5,000 ms, #of logical CPUS: 8
#
# Warm up:
           1 threads, Tavg =      1.80 ns/op (? = 0.00 ns/op) [     1.80]
           1 threads, Tavg =      1.74 ns/op (? = 0.00 ns/op) [     1.74]
# Measure:
           1 threads, Tavg =      1.78 ns/op (? = 0.00 ns/op) [     1.78]
2 threads, Tavg = 1.80 ns/op (? = 0.00 ns/op) [ 1.80, 1.80] 3 threads, Tavg = 1.84 ns/op (? = 0.03 ns/op) [ 1.80, 1.88, 1.86] 4 threads, Tavg = 1.78 ns/op (? = 0.00 ns/op) [ 1.78, 1.77, 1.77, 1.78]

<< JVM END

>> JVM START
#
# isLoggableFinest: run duration:  5,000 ms, #of logical CPUS: 8
#
# Warm up:
           1 threads, Tavg =      1.73 ns/op (? = 0.00 ns/op) [     1.73]
           1 threads, Tavg =      1.72 ns/op (? = 0.00 ns/op) [     1.72]
# Measure:
           1 threads, Tavg =      1.38 ns/op (? = 0.00 ns/op) [     1.38]
2 threads, Tavg = 1.37 ns/op (? = 0.00 ns/op) [ 1.37, 1.37] 3 threads, Tavg = 1.52 ns/op (? = 0.19 ns/op) [ 1.37, 1.45, 1.79] 4 threads, Tavg = 1.58 ns/op (? = 0.19 ns/op) [ 1.42, 1.79, 1.81, 1.41]


>> java.util.logging enabled

#
# isLoggableFinest: run duration:  5,000 ms, #of logical CPUS: 8
#
# Warm up:
           1 threads, Tavg =      5.43 ns/op (? = 0.00 ns/op) [     5.43]
           1 threads, Tavg =      5.65 ns/op (? = 0.00 ns/op) [     5.65]
# Measure:
           1 threads, Tavg =      5.66 ns/op (? = 0.00 ns/op) [     5.66]
2 threads, Tavg = 5.67 ns/op (? = 0.01 ns/op) [ 5.68, 5.66] 3 threads, Tavg = 5.65 ns/op (? = 0.00 ns/op) [ 5.65, 5.65, 5.65] 4 threads, Tavg = 5.66 ns/op (? = 0.01 ns/op) [ 5.65, 5.65, 5.67, 5.67]

<< JVM END


It appears the speed-up is visible and varies from 1.6x to 2.6x, depending on other factors that also play role in the overall speed (like JIT optimizations).

I only presented the results for PlatformLogger.isLoggable(FINEST), because this is the most sensitive call in tight loops. Other methods perform similarly.

This is the test code I used:


package test;

import si.pele.microbench.TestRunner;
import sun.util.logging.PlatformLogger;

import java.util.logging.LogManager;

public class PlatformLoggerBenchmark extends TestRunner {

static final PlatformLogger log = PlatformLogger.getLogger(PlatformLoggerBenchmark.class.getName());

    static {
        log.setLevel(PlatformLogger.SEVERE); // almost OFF
    }

    public static class isLoggableFinest extends Test {
        @Override
protected void doLoop(Loop loop, DevNull devNull1, DevNull devNull2, DevNull devNull3, DevNull devNull4, DevNull devNull5) {
            while (loop.nextIteration()) {
devNull1.yield(log.isLoggable(PlatformLogger.FINEST));
            }
        }
    }

    public static void main(String[] args) throws Exception {

        System.out.println(">> JVM START");

        doTest(isLoggableFinest.class, 5000L, 1, 4, 1);

        // enable java.util.logging
LogManager.getLogManager().getLogger(log.getName());
        System.out.println("\n>> java.util.logging enabled\n");

        doTest(isLoggableFinest.class, 5000L, 1, 4, 1);

        System.out.println("<< JVM END");
    }
}


The test uses a simple microbenchmark runner that I coded together using some ideas from the sessions with Alexey Shipilev and his benchmark suite.

Here's the source to my dirty implementation:

https://github.com/plevart/micro-bench/tree/v2


Regards, Peter

On 03/22/2013 02:28 PM, Peter Levart wrote:
Ok, Lauret, Mandy,

Here are the final touches:

https://dl.dropboxusercontent.com/u/101777488/jdk8-tl/PlatformLogger/webrev.07/index.html

Changes from webrev.06:

- renamed back Level -> LevelEnum
- renamed JavaLogger -> JavaLoggerProxy
- moved javaLevel(int) static method from LevelEnum to JavaLoggerProxy and made it private (only used there) - consistent use of variable name 'level' to refer to PlatformLogger's int level value - consistent use of variable name 'levelEnum' to refer to LevelEnum member - consistent use of variable name 'javaLevel' to refer to java.util.logging.Level instance - consistent use of variable name 'javaLogger' to refer to java.util.logging.Logger instance - renamed PlatformLogger.newJavaLogger() private method -> PlatformLogger.redirectToJavaLoggerProxy() - renamed PlatformLogger.logger private field -> PlatformLogger.loggerProxy
- some additional comments

I think that these changes make code more consistent and self-explanatory.

What remains is a possible rename from: javaLogger, javaLevel, JavaLoggerProxy -> julLogger, julLevel, JulLoggerProxy if that's the final decision.

Regards, Peter


On 03/22/2013 01:26 PM, Peter Levart wrote:
On 03/22/2013 11:34 AM, Laurent Bourgès wrote:
Peter,

    You've beaten me! I have been preparing them too ;-) ...


Ok I definitely stop working on the code and let you do it.

Ok. I'll make the final touches, incorporating also comments and changes from your code...


    I also did some some renames, that I think make the code more
    consistent:
    - LevelEnum -> Level (the code is not dependent on
    java.util.logging.Level, so the name can be reused, its
    private anyway)
    - julLevel -> javaLevel (javaLevel / JavaLogger)
    - LevelEnum.forValue -> Level.valueOf (Mandy)
    - JavaLogger.julLevelToEnum -> JavaLogger.javaLevelToLevel


For consistency and clarity, I would prefer having following conventions: - int levelValue (= PlatformLevel as int) and not int level (conflict with Level enum ...)

I think that PlatformLogger public API should stay as is. Public method's parameter names are just called 'level' and values of public constants are expected to be passed to them. There are only two places where 'level' is the name of a local variable of type Level (and not int) and at both places there is not conflict, since there's no 'int level' variable in scope.

I renamed LevelEnum to Level because the following most frequently used pattern looks strange:

    LevelEnum.javaLevel(int)

Neither parameter nor the result has anything to do with LevelEnum directly.

But if we move the javaLevel(int) method out of the Level enum into the JavaLogger, then Level can be renamed back to LevelEnum (or anything else?).


- julLevel / julLogger: more explicit than javaLevel / javaLogger (java means everything ... but jul means java.util.logging and javaLogger is in conflict with JavaLogger class)


But javaLogger is not in conflict with javaLevel. I suggest renaming JavaLoger class to JavaLoggerProxy, so we would have:

Object javaLogger // holding java.util.logging.Logger instance
Object javaLevel    // holding java.util.logging.Level instance

class JavaLoggerProxy // a specialization of LoggerProxy, delegating to javaLogger

If 'jul' is a prefered prefix to 'java', I suggest renaming all 3: julLogger, julLevel, JulLoggerProxy. I don't have a preference for either, so perhaps Mandy, Laurent or anybody else can express them...


Regards, Peter


    Other changes (to webrev.05):
    - removed the occurrence counts in switch comments (as per
    Mandy's suggestion)
    - made LoggerProxy and JavaLogger private
    - fixed double-read of volatile LoggerProxy.levelValue in
    LoggerProxy.isLoggable()
    - added static Level.javaLevel(int value) shortcut (Mandy)

    I also updated the test to exercise the correctness of mappings.


Well done.

cheers,
Laurent






Reply via email to