Am 16.03.2010 23:35, schrieb Xueming Shen:
Martin Buchholz wrote:
On Tue, Mar 16, 2010 at 13:06, Xueming Shen <[email protected]>
wrote:
Martin Buchholz wrote:
Therefore the existing implementation
return codePoint>= MIN_SUPPLEMENTARY_CODE_POINT
&& codePoint<= MAX_CODE_POINT;
will almost always perform just one comparison against a constant,
which is hard to beat.
1. Wondering: I think there are TWO comparisons.
2. Those comparisons need to load 32 bit values from machine code,
against
only 8 bit values in my case.
It's a good point. In the machine code, shifts are likely to use
immediate values, and so will be a small win.
int x = codePoint >>> 16;
return x != 0 && x < 0x11;
(On modern hardware, these optimizations
are less valuable than they used to be;
ordinary integer arithmetic is almost free)
I'm not convinced if the proposed code is really better...a "small
win".
The primary theory here is that branches are expensive,
and we are reducing them by one.
There are still two branches in new impl, if you count the "ifeq" and
"if_icmpge"(?)
True.
But
for (int i = offset; i < offset + count; i++) {
int c = codePoints[i];
byte plane = (byte)(c >>> 16);
if (plane == 0)
n += 1;
else if (plane >= 0 && plane <= (byte)0x11)
n += 2;
else throw new IllegalArgumentException(Integer.toString(c));
}
has too only 2 branches if 6932837 would be fixed, 3 otherwise, and
additionally could benefit from tiny 8-bit comparisons.
The shift additionally could be omitted on CPU's which can benefit from
6933327.
Instead:
for (int i = offset; i < offset + count; i++) {
int c = codePoints[i];
if (c >= Character.MIN_VALUE &&
c <= Character.MAX_VALUE)
n += 1;
else if (c >= Character.MIN_SUPPLEMENTARY_CODE_POINT &&
c <= Character.MAX_SUPPLEMENTARY_CODE_POINT)
n += 2;
else throw new IllegalArgumentException(Integer.toString(c));
}
needs 4 branches and 4 32-bit comparisons.
We are trying to "optimize" this piece of code with the assumption
that the new impl MIGHT help certain vm (hotspot?)
to optimize certain use scenario (some consecutive usages), if the
compiler and/or the vm are both smart enough at certain
point, with no supporting benchmark data?
My concern is that the reality might be that this optimization might
even hurt the BMP use
case (the majority of the possible real world use scenarios) with a
10% bigger bytecode size.
-Sherman
public class Character extends java.lang.Object {
public static final int MIN_SUPPLEMENTARY_CODE_POINT = 65536;
public static final int MAX_CODE_POINT = 1114111;
public Character();
Code:
0: aload_0 1: invokespecial #1 //
Method java/lang/Object."<init>":()V
4: return
public static boolean isSupplementaryCodePoint(int);
Code:
0: iload_0 1: ldc #2 //
int 65536
3: if_icmplt 16
6: iload_0 7: ldc #3 //
int 1114111
9: if_icmpgt 16
12: iconst_1 13: goto 17
16: iconst_0 17: ireturn
public static boolean isSupplementaryCodePoint_new(int);
Code:
0: iload_0 1: bipush 16
3: iushr 4: istore_1
5: iload_1 6: ifeq 19
9: iload_1 10: bipush 17
12: if_icmpge 19
15: iconst_1 16: goto 20
19: iconst_0 20: ireturn }