I am recanting my previous support for any change to isSupplementaryCodePoint.
I think my brain (or maybe Ulf's brain) tricked me into thinking that the considerations for isValidCodePoint and isBMPCodePoint also apply to isSupplementaryCodePoint. Sorry. I renamed my patch file from isSupplementaryCodePoint to isValidCodePoint. 6934268: Better implementation of Character.isValidCodePoint http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isValidCodePoint 6934265: Add public method Character.isBMPCodePoint http://cr.openjdk.java.net/~martin/webrevs/openjdk7/public-isBMPCodePoint 6934270: Remove javac warnings from Character.java http://cr.openjdk.java.net/~martin/webrevs/openjdk7/Character-warnings 6934271: Better handling of longer utf-8 sequences http://cr.openjdk.java.net/~martin/webrevs/openjdk7/utf8-twiddling 6935172: Optimize bit-twiddling in Bits.java http://cr.openjdk.java.net/~martin/webrevs/openjdk7/Bits.java Martin On Tue, Mar 16, 2010 at 15:35, Xueming Shen <[email protected]> wrote: > Martin Buchholz wrote: >> >> On Tue, Mar 16, 2010 at 13:06, Xueming Shen <[email protected]> wrote: >> >>> >>> Martin Buchholz wrote: >>> >>>> >>>> Therefore the existing implementation >>>> >>>>>> >>>>>> return codePoint>= MIN_SUPPLEMENTARY_CODE_POINT >>>>>> && codePoint<= MAX_CODE_POINT; >>>>>> >>>>>> will almost always perform just one comparison against a constant, >>>>>> which is hard to beat. >>>>>> >>>>>> >>>>>> >>>>> >>>>> 1. Wondering: I think there are TWO comparisons. >>>>> 2. Those comparisons need to load 32 bit values from machine code, >>>>> against >>>>> only 8 bit values in my case. >>>>> >>>>> >>>> >>>> It's a good point. In the machine code, shifts are likely to use >>>> immediate values, and so will be a small win. >>>> >>>> int x = codePoint >>> 16; >>>> return x != 0 && x < 0x11; >>>> >>>> (On modern hardware, these optimizations >>>> are less valuable than they used to be; >>>> ordinary integer arithmetic is almost free) >>>> >>>> >>>> >>> >>> I'm not convinced if the proposed code is really better...a "small win". >>> >> >> The primary theory here is that branches are expensive, >> and we are reducing them by one. >> >> > > There are still two branches in new impl, if you count the "ifeq" and > "if_icmpge"(?) > > We are trying to "optimize" this piece of code with the assumption that the > new impl MIGHT help certain vm (hotspot?) > to optimize certain use scenario (some consecutive usages), if the compiler > and/or the vm are both smart enough at certain > point, with no supporting benchmark data? > > My concern is that the reality might be that this optimization might even > hurt the BMP use > case (the majority of the possible real world use scenarios) with a 10% > bigger bytecode size. > > -Sherman > > > > public class Character extends java.lang.Object { > public static final int MIN_SUPPLEMENTARY_CODE_POINT = 65536; > > public static final int MAX_CODE_POINT = 1114111; > > public Character(); > Code: > 0: aload_0 1: invokespecial #1 // Method > java/lang/Object."<init>":()V > 4: return > public static boolean isSupplementaryCodePoint(int); > Code: > 0: iload_0 1: ldc #2 // int > 65536 > 3: if_icmplt 16 > 6: iload_0 7: ldc #3 // int > 1114111 > 9: if_icmpgt 16 > 12: iconst_1 13: goto 17 > 16: iconst_0 17: ireturn > public static boolean isSupplementaryCodePoint_new(int); > Code: > 0: iload_0 1: bipush 16 > 3: iushr 4: istore_1 > 5: iload_1 6: ifeq 19 > 9: iload_1 10: bipush 17 > 12: if_icmpge 19 > 15: iconst_1 16: goto 20 > 19: iconst_0 20: ireturn } > > > > > >
