Re: Request for review: 6896617: Optimize sun.nio.cs.ISO_8859_1$Encode.encodeArrayLoop() on x86

Vladimir Kozlov Tue, 22 Jan 2013 12:25:49 -0800

Thank you, Ulf

I will rename method to encodeISOArray and add comment that it could bereplaced by intrinsic by JVM.

The same arrays were use intentionally in test to get performance ofcode without effect of CPU's cache/memory subsystem.

The method encodeArrayLoop() is also compiled because it is invoked >10000 times. And I don't see an effect on performance of its code changeso I will leave it as it is.


Thanks,
Vladimir

On 1/21/13 3:30 PM, Ulf Zibis wrote:

Am 17.01.2013 05:27, schrieb Vladimir Kozlov:

On 1/12/13 12:37 AM, Ulf Zibis wrote:

3) bugs.sun.com/bugdatabase/view_bug.do?bug_id=6896617 ==> This bug is
not available.


I opened it, should show up in few days.


Thanks!

4) What specific operation should be done by the intrinsic, i.e. is
there a fixed API for that method ???


When C2 (server JIT compiler in JVM) compiles encode methods it will
replace new method encodeArray() (matched by signature) with hand
optimized assembler code which uses latest processor instructions. I
will send Hotspot changes soon. So it is nothing to do with
interpreter or bytecode sequence.

5) Can an intrinsic write back more than 1 value (see my hack via int[]
p) ?
6) Vladimir's webrev shows an integer as return type for that method,
I've added a variant with boolean return type, and the code from my last
approach could be transformed to a method with Object return type.


I wanted to say, there would maybe different results for the surrounding
code depending on the API we choose for the intrinsic method call. If
the surrounding code is *as short as possible*, there is a better
chance, it will be JIT-compiled too after fewer invocations of
encodeArrayLoop(). I guess, the fastest would be:

// while inlinig, JIT will erase the surrounding int[] p
private static CoderResult copyISOs(CoderResult cr,
         char[] sa, byte[] da, final int[] p, int sl) {
     for (int sp = p[0], dp = p[1]; sp < sl; sp++) {
         char c = sa[sp];
         // if (c & '\uFF00' != 0) // needs bug 6935994 to be fixed,
would be fastest
         if ((byte)(c >> 8) != 0) // temporary replacement, fast
byte-length operation on x86
             return null;
         da[dp++] = (byte)c;
     }
     return cr;
}

// No more needs try...finally block
private CoderResult encodeArrayLoop(
         CharBuffer src, ByteBuffer dst) {
     char[] sa = src.array();
     int soff = src.arrayOffset();
     int sp = soff + src.position();
     int sr = src.remaining();
     byte[] da = dst.array();
     int doff = dst.arrayOffset();
     int dp = doff + dst.position();
     int dr = dst.remaining();
     CoderResult cr;
     if (dr < sr) {
         sr = dr;
         cr = CoderResult.OVERFLOW;
     } else
         cr = CoderResult.UNDERFLOW;
     int sl = sp + sr;
     final int[] p = { sp, dp };
     cr = copyISOs(cr, sa, da, p, sl);
     src.position(p[0] - soff);
     dst.position(p[1] - doff);
     return result(cr, sa, p[0], sl);

}

// if adapted, maybe could also be reused in encodeBufferLoop()
private static CoderResult result(CoderResult cr, byte[] sa, int sp, int
sl) {
     return cr != null ? cr :
         sgp.parse(sa[sp], sa, sp, sl) < 0
             ? sgp.error();
             : sgp.unmappableResult();
}


... so waiting for Vladimir's feedback :-[
(especially on performance/hsdis results)


Performance on x86 tested with next code (whole test will be in
Hotspot changes) :

        ba = CharBuffer.wrap(a);
        bb = ByteBuffer.wrap(b);
        long start = System.currentTimeMillis();
        for (int i = 0; i < 1000000; i++) {
            ba.clear(); bb.clear();
            enc_res = enc_res && enc.encode(ba, bb, true).isUnderflow();
        }
        long end = System.currentTimeMillis();


1.) Wouldn't System.nanoTime() give more accurate results?
2.) I want to point out that it is not real world scenario, encoding the
same data 1.000.000 times. If same data is used, it is likely, that the
data itself becomes cached in the inner CPU cache so should have very
fast access times, which would be not the case on real world data.
3.) It would also be interesting to see the results for less than
1.000.000 iterations in considering, the surrounding code would be
JIT-compiled or not. Also performance on C1 should be tested.

I also worry about the naming of method encodeArray(...). I think, it
should reflect the fact, that it only encodes ISO-8859-1 charset characters.

Please add a comment on the fact, that method encodeArray(...) is
intended to be intrinsified.

-Ulf

Re: Request for review: 6896617: Optimize sun.nio.cs.ISO_8859_1$Encode.encodeArrayLoop() on x86

Reply via email to