Re: Sponsor for 6666666: A better implementation of Character.isSupplementaryCodePoint

Xueming Shen Mon, 15 Mar 2010 23:27:45 -0700

CR 6935172 Created, P4 java/classes_io Optimize bit-twiddling in Bits.java


Can I assume the webrev is

http://cr.openjdk.java.net/~martin/webrevs/openjdk7/Bits.java

-Sherman

Martin Buchholz wrote:

OK, next round of review.

I changed my UTF-8 changes to be behavior-preserving,
removing any hint of controversy, and renamed the patch
to "utf8-twiddling".

I got Ulf in my head, and can't stop micro-optimizing.
I added a new micro-optimizing patch for Bits.java.
Please file a bug.

6934268: Better implementation of Character.isValidCodePoint and
isSupplementaryCodePoint()
http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isSupplementaryCodePoint
6934265: Add public method Character.isBMPCodePoint
http://cr.openjdk.java.net/~martin/webrevs/openjdk7/public-isBMPCodePoint
6934270: Remove javac warnings from Character.java
http://cr.openjdk.java.net/~martin/webrevs/openjdk7/Character-warnings
6934271: Better handling of longer utf-8 sequences
http://cr.openjdk.java.net/~martin/webrevs/openjdk7/utf8-twiddling
6666666: Optimize bit-twiddling in Bits.java
http://cr.openjdk.java.net/~martin/webrevs/openjdk7/qtip tip Bits.java

Now I need to go off to my micro-optimizers-anonymous meeting.

Martin

On Thu, Mar 11, 2010 at 13:24, Xueming Shen <[email protected]> wrote:

Martin, Ulf

Following bug/rfs have been filed.

6934265 Add public method Character.isBMPCodePoint
6934268 Better implementation of Character.isValidCodePoint and
isSupplementaryCodePoint()
6934270: Remove javac warnings from Character.java
6934271: Better handling of longer utf-8 sequences

Masayoshi, Alan would you please help review the corresponding CCC for
6934265 at
http://ccc.sfbay.sun.com/6934265

Martin, don't touch the utf-8 malformed issue for now, and incompatible
change in UTF-8
is A issue.

sherman

Martin Buchholz wrote:

Ulf, your changes would be easier to get in
if they were organized as mq patch files that
could be qimported into an existing mq repo.

I've done that below, which includes a subset of
your own proposed changes:


http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isSupplementaryCodePoint/
http://cr.openjdk.java.net/~martin/webrevs/openjdk7/public-isBMPCodePoint/
http://cr.openjdk.java.net/~martin/webrevs/openjdk7/Character-warnings/
http://cr.openjdk.java.net/~martin/webrevs/openjdk7/malformed-utf8/

Sherman (or Alan),

please review and/or file bugs for the above changes.

isBMPCodePoint is a spec addition, requiring additional paperwork.

Sherman, you owe me a response to my now-moldy proposed changes to
the UTF-8 charset.

The only controversial change would be the change in behavior in
malformed-utf8, which I can take out.

Martin

On Thu, Mar 11, 2010 at 10:32, Ulf Zibis <[email protected]> wrote:

Sherman,

I know, your time ...

... but maybe someone is needed for sponsor here:
https://bugs.openjdk.java.net/show_bug.cgi?id=100132

Could you do this?

Much thanks,

-Ulf


Am 10.03.2010 19:23, schrieb Xueming Shen:

approved.

I don't have a spare ws right now.so please just push, it's almost
there:-)

sherman

Martin Buchholz wrote:

Here's the proposed fix for
6931812: A better implementation of sun.nio.cs.Surrogate.isBMP(int)

http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isBMPCodePoint/

I changed the name to isBMPCodePoint in preparation for moving
it to Character.java.
(Sherman, perhaps you would like to take on that followon task?)

Sherman, please approve.

Martin

On Sat, Mar 6, 2010 at 13:00, Ulf Zibis <[email protected]> wrote:

Very fast Sherman, much thanks.

Could you set the bug to accepted and evaluated, so my patch will have
a
chance to get into the code base?

-Ulf


Am 03.03.2010 20:11, schrieb Xueming Shen:

#6931812

Martin Buchholz wrote:

Sherman, would you like to file bugs for Ulf's improvements?

On Wed, Mar 3, 2010 at 02:44, Ulf Zibis <[email protected]> wrote:

Am 03.03.2010 09:00, schrieb Martin Buchholz:

Keep in mind that supplementary characters are extremely rare.

Yes, but many API's in the JDK are used rarely.
Why should they waste memory footprint / perform bad, particularly
if
it
doesn't cost anything.

I admire your perfectionism.

Therefore the existing implementation

 return codePoint>= MIN_SUPPLEMENTARY_CODE_POINT
&&  codePoint<= MAX_CODE_POINT;

will almost always perform just one comparison against a constant,
which is hard to beat.

1. Wondering: I think there are TWO comparisons.
2. Those comparisons need to load 32 bit values from machine code,
against
only 8 bit values in my case.

It's a good point.  In the machine code, shifts are likely to use
immediate values, and so will be a small win.

int x = codePoint >>> 16;
return x != 0 && x < 0x11;

(On modern hardware, these optimizations
are less valuable than they used to be;
ordinary integer arithmetic is almost free)

Martin

Re: Sponsor for 6666666: A better implementation of Character.isSupplementaryCodePoint

Reply via email to