Am 03.03.2010 09:00, schrieb Martin Buchholz:
On Tue, Mar 2, 2010 at 15:34, Ulf Zibis<[email protected]>  wrote:
Am 26.08.2009 20:02, schrieb Xueming Shen:
For example, the isBMP(int), it might be convenient, but it can be easily
archived by the one line code

(int)(char)codePoint == codePoint;

or more readable form

   codePoint<  Character.MIN_SUPPLEMENTARY_COE_POINT;

In class sun.nio.cs.Surrogate we have:
    public static boolean isBMP(int uc) {
        return (int) (char) uc == uc;
    }

1.) It's enough to have:
        return (char)uc == uc;
    better:
        assert MIN_VALUE == 0&&  MAX_VALUE == 0xFFFF;
        return (char)uc == uc;
        // Optimized form of: uc>= MIN_VALUE&&  uc<= MAX_VALUE

2.) Above code is compiled to (needs 16 bytes of machine code):
  0x00b87ad8: mov    %ebx,%ebp
  0x00b87ada: and    $0xffff,%ebp
  0x00b87ae0: cmp    %ebx,%ebp
  0x00b87ae2: jne    0x00b87c52
  0x00b87ae8:

    We could code:
        assert MIN_VALUE == 0&&  (MAX_VALUE + 1) == (1<<  16);
        return (uc>>  16) == 0;
        // Optimized form of: uc>= MIN_VALUE&&  uc<= MAX_VALUE
I agree that
return (uc>>  16) == 0;
is marginally better than my
return (int) (char) uc == uc;
(although I think the redundant cast to int
makes the code more readable).

Seems to be individual. I always stumble over superfluous casts by thinking about, what they have to do.

I approve such a change to isBMPCodePoint()
and inclusion of such a method in Character.

Pleased! Who could file the bug? I would provide the patch.


    is compiled to (needs only 9 bytes of machine code):
  0x00b87aac: mov    %ebx,%ecx
  0x00b87aae: sar    $0x10,%ecx
  0x00b87ab1: test   %ecx,%ecx
  0x00b87ab3: je     0x00b87acb
  0x00b87ab5:

1.) If we have:
    public static boolean isSupplementaryCodePoint(int codePoint) {
        assert MIN_SUPPLEMENTARY_CODE_POINT == (1<<  16)&&
                (MAX_SUPPLEMENTARY_CODE_POINT + 1) % (1<<  16) == 0;
        return (codePoint>>  16) != 0
&&  (codePoint>>  16)<  (MAX_SUPPLEMENTARY_CODE_POINT + 1>>  16);
        // Optimized form of: codePoint>= MIN_SUPPLEMENTARY_CODE_POINT
        //&&  codePoint<= MAX_SUPPLEMENTARY_CODE_POINT;
    }
Keep in mind that supplementary characters are extremely rare.

Yes, but many API's in the JDK are used rarely.
Why should they waste memory footprint / perform bad, particularly if it doesn't cost anything.

Therefore the existing implementation

  return codePoint>= MIN_SUPPLEMENTARY_CODE_POINT
             &&  codePoint<= MAX_CODE_POINT;

will almost always perform just one comparison against a constant,
which is hard to beat.

1. Wondering: I think there are TWO comparisons.
2. Those comparisons need to load 32 bit values from machine code, against only 8 bit values in my case. 3. The first of the 2 comparisons becomes outlined if compiled in combination with isBMPCodePoint(). (see below)


I'm not sure whether your code above gets the right answer for negative input.
Perhaps you need to do
(codePoint>>>  16)<  ...

Oops, I'm afraid you are right, but fortunately it doesn't cost anything: sar becomes replaced by shr.

Martin

and:
        if (Surrogate.isBMP(uc))
            ...;
        else if (Character.isSupplementaryCodePoint(uc))
            ...;
        else
            ...;

    we get (needs only 18 bytes of machine code):
  0x00b87aac: mov    %ebx,%ecx
  0x00b87aae: sar    $0x10,%ecx
  0x00b87ab1: test   %ecx,%ecx
  0x00b87ab3: je     0x00b87acb
  0x00b87ab5: cmp    $0x11,%ecx
  0x00b87ab8: jge    0x00b87ce6
  0x00b87abe:


BTW the compiled code of the existing code (needs *36* bytes of machine code):
  0x00b87a5c: test   %ebp,%ebp
  0x00b87a5e: jl     0x00b87a68
  0x00b87a60: cmp    $0x10000,%ebp
  0x00b87a66: jl     0x00b87a8d
  0x00b87a68: cmp    $0x10000,%ebp
  0x00b87a6e: jl     0x00b87c63
  0x00b87a74: cmp    $0x10ffff,%ebp
  0x00b87a7a: jg     0x00b87c63
  0x00b87a80:

BTW 2: The code example is seen in one of the String constructors, where there the 2-comparison code is manually inlined instead of using Surrogate.isBMP().

-Ulf



Reply via email to