Re: Sponsor for 6666666: A better implementation of Character.isSupplementaryCodePoint

Ulf Zibis Thu, 11 Mar 2010 13:14:44 -0800

Am 11.03.2010 20:38, schrieb Martin Buchholz:

Ulf, your changes would be easier to get in
if they were organized as mq patch files that
could be qimported into an existing mq repo.


To be honest, I never heard about mq. Can you point me to some docs please?

I've done that below, which includes a subset of
your own proposed changes:

http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isSupplementaryCodePoint/
http://cr.openjdk.java.net/~martin/webrevs/openjdk7/public-isBMPCodePoint/


- Maybe better:  "... using a single {...@code char}".

- Why don't you like using the new isBMPCodePoint() for isSupplementaryCodePoint() andtoUpperCaseCharArray() ?- Same shift magic would enhance isISOControl(), isHighSurrogate(), isLowSurrogate(), in particularif latter occur consecutive.

  8-bit shift + compare would allow HotSpot to compile to smart 1-byte 
immediate op-codes.
- Don't you think my notes on validity are worth to add. (or separate bug ?)

- Changing ch <= MAX_SURROGATE to ch < MAX_SURROGATE + 1 would allow HotSpot compiler to optimize 1branch if those methods are used consecutive.

- And at last, I would like to make the constants complete (= adding 
MAX_SUPPLEMENTARY_CODE_POINT).

http://cr.openjdk.java.net/~martin/webrevs/openjdk7/Character-warnings/

Remembers me that some months ago I prepared a beautified version of Character's source (things likeabove, replacing <code> against {...@code}, indentation inconsistencies etc.) Would there be interestto provide such a patch ?

http://cr.openjdk.java.net/~martin/webrevs/openjdk7/malformed-utf8/


In encodeBufferLoop() you could use putChar(), putInt() instead put(). Should 
perform better.

Sherman (or Alan),

please review and/or file bugs for the above changes.

isBMPCodePoint is a spec addition, requiring additional paperwork.

Sherman, you owe me a response to my now-moldy proposed changes to
the UTF-8 charset.

The only controversial change would be the change in behavior in
malformed-utf8, which I can take out.

This remembers me at some thoughts. To be *exact* I think malformed should be returned for allcodes, which are invalid in the regarding character set. So first validate for unmappable and secondfor invalid (=malformed). Doesn't cost any performance in looping mappable and valid characters, butlittle more effort after the loop is interrupted to form the right CoderResult.



-Ulf

Re: Sponsor for 6666666: A better implementation of Character.isSupplementaryCodePoint

Reply via email to