OK, next round of review. I changed my UTF-8 changes to be behavior-preserving, removing any hint of controversy, and renamed the patch to "utf8-twiddling".
I got Ulf in my head, and can't stop micro-optimizing. I added a new micro-optimizing patch for Bits.java. Please file a bug. 6934268: Better implementation of Character.isValidCodePoint and isSupplementaryCodePoint() http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isSupplementaryCodePoint 6934265: Add public method Character.isBMPCodePoint http://cr.openjdk.java.net/~martin/webrevs/openjdk7/public-isBMPCodePoint 6934270: Remove javac warnings from Character.java http://cr.openjdk.java.net/~martin/webrevs/openjdk7/Character-warnings 6934271: Better handling of longer utf-8 sequences http://cr.openjdk.java.net/~martin/webrevs/openjdk7/utf8-twiddling 6666666: Optimize bit-twiddling in Bits.java http://cr.openjdk.java.net/~martin/webrevs/openjdk7/qtip tip Bits.java Now I need to go off to my micro-optimizers-anonymous meeting. Martin On Thu, Mar 11, 2010 at 13:24, Xueming Shen <[email protected]> wrote: > Martin, Ulf > > Following bug/rfs have been filed. > > 6934265 Add public method Character.isBMPCodePoint > 6934268 Better implementation of Character.isValidCodePoint and > isSupplementaryCodePoint() > 6934270: Remove javac warnings from Character.java > 6934271: Better handling of longer utf-8 sequences > > Masayoshi, Alan would you please help review the corresponding CCC for > 6934265 at > http://ccc.sfbay.sun.com/6934265 > > Martin, don't touch the utf-8 malformed issue for now, and incompatible > change in UTF-8 > is A issue. > > sherman > > Martin Buchholz wrote: >> >> Ulf, your changes would be easier to get in >> if they were organized as mq patch files that >> could be qimported into an existing mq repo. >> >> I've done that below, which includes a subset of >> your own proposed changes: >> >> >> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isSupplementaryCodePoint/ >> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/public-isBMPCodePoint/ >> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/Character-warnings/ >> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/malformed-utf8/ >> >> Sherman (or Alan), >> >> please review and/or file bugs for the above changes. >> >> isBMPCodePoint is a spec addition, requiring additional paperwork. >> >> Sherman, you owe me a response to my now-moldy proposed changes to >> the UTF-8 charset. >> >> The only controversial change would be the change in behavior in >> malformed-utf8, which I can take out. >> >> Martin >> >> On Thu, Mar 11, 2010 at 10:32, Ulf Zibis <[email protected]> wrote: >> >>> >>> Sherman, >>> >>> I know, your time ... >>> >>> ... but maybe someone is needed for sponsor here: >>> https://bugs.openjdk.java.net/show_bug.cgi?id=100132 >>> >>> Could you do this? >>> >>> Much thanks, >>> >>> -Ulf >>> >>> >>> Am 10.03.2010 19:23, schrieb Xueming Shen: >>> >>>> >>>> approved. >>>> >>>> I don't have a spare ws right now.so please just push, it's almost >>>> there:-) >>>> >>>> sherman >>>> >>>> Martin Buchholz wrote: >>>> >>>>> >>>>> Here's the proposed fix for >>>>> 6931812: A better implementation of sun.nio.cs.Surrogate.isBMP(int) >>>>> >>>>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isBMPCodePoint/ >>>>> >>>>> I changed the name to isBMPCodePoint in preparation for moving >>>>> it to Character.java. >>>>> (Sherman, perhaps you would like to take on that followon task?) >>>>> >>>>> Sherman, please approve. >>>>> >>>>> Martin >>>>> >>>>> On Sat, Mar 6, 2010 at 13:00, Ulf Zibis <[email protected]> wrote: >>>>> >>>>>> >>>>>> Very fast Sherman, much thanks. >>>>>> >>>>>> Could you set the bug to accepted and evaluated, so my patch will have >>>>>> a >>>>>> chance to get into the code base? >>>>>> >>>>>> -Ulf >>>>>> >>>>>> >>>>>> Am 03.03.2010 20:11, schrieb Xueming Shen: >>>>>> >>>>>>> >>>>>>> #6931812 >>>>>>> >>>>>>> Martin Buchholz wrote: >>>>>>> >>>>>>>> >>>>>>>> Sherman, would you like to file bugs for Ulf's improvements? >>>>>>>> >>>>>>>> On Wed, Mar 3, 2010 at 02:44, Ulf Zibis <[email protected]> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> Am 03.03.2010 09:00, schrieb Martin Buchholz: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Keep in mind that supplementary characters are extremely rare. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> Yes, but many API's in the JDK are used rarely. >>>>>>>>> Why should they waste memory footprint / perform bad, particularly >>>>>>>>> if >>>>>>>>> it >>>>>>>>> doesn't cost anything. >>>>>>>>> >>>>>>>> >>>>>>>> I admire your perfectionism. >>>>>>>> >>>>>>>> >>>>>>>>>> >>>>>>>>>> Therefore the existing implementation >>>>>>>>>> >>>>>>>>>> return codePoint>= MIN_SUPPLEMENTARY_CODE_POINT >>>>>>>>>> && codePoint<= MAX_CODE_POINT; >>>>>>>>>> >>>>>>>>>> will almost always perform just one comparison against a constant, >>>>>>>>>> which is hard to beat. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> 1. Wondering: I think there are TWO comparisons. >>>>>>>>> 2. Those comparisons need to load 32 bit values from machine code, >>>>>>>>> against >>>>>>>>> only 8 bit values in my case. >>>>>>>>> >>>>>>>> >>>>>>>> It's a good point. In the machine code, shifts are likely to use >>>>>>>> immediate values, and so will be a small win. >>>>>>>> >>>>>>>> int x = codePoint >>> 16; >>>>>>>> return x != 0 && x < 0x11; >>>>>>>> >>>>>>>> (On modern hardware, these optimizations >>>>>>>> are less valuable than they used to be; >>>>>>>> ordinary integer arithmetic is almost free) >>>>>>>> >>>>>>>> Martin >>>>>>>> >>>> >>>> >>> >>> > >
