Am 11.03.2010 20:38, schrieb Martin Buchholz:
Ulf, your changes would be easier to get in
if they were organized as mq patch files that
could be qimported into an existing mq repo.


To be honest, I never heard about mq. Can you point me to some docs please?

I've done that below, which includes a subset of
your own proposed changes:

http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isSupplementaryCodePoint/
http://cr.openjdk.java.net/~martin/webrevs/openjdk7/public-isBMPCodePoint/


- Maybe better:  "... using a single {...@code char}".
- Why don't you like using the new isBMPCodePoint() for isSupplementaryCodePoint() and toUpperCaseCharArray() ? - Same shift magic would enhance isISOControl(), isHighSurrogate(), isLowSurrogate(), in particular if latter occur consecutive.
  8-bit shift + compare would allow HotSpot to compile to smart 1-byte 
immediate op-codes.
- Don't you think my notes on validity are worth to add. (or separate bug ?)
- Changing ch <= MAX_SURROGATE to ch < MAX_SURROGATE + 1 would allow HotSpot compiler to optimize 1 branch if those methods are used consecutive.
- And at last, I would like to make the constants complete (= adding 
MAX_SUPPLEMENTARY_CODE_POINT).

http://cr.openjdk.java.net/~martin/webrevs/openjdk7/Character-warnings/


Remembers me that some months ago I prepared a beautified version of Character's source (things like above, replacing <code> against {...@code}, indentation inconsistencies etc.) Would there be interest to provide such a patch ?

http://cr.openjdk.java.net/~martin/webrevs/openjdk7/malformed-utf8/


In encodeBufferLoop() you could use putChar(), putInt() instead put(). Should 
perform better.

Sherman (or Alan),

please review and/or file bugs for the above changes.

isBMPCodePoint is a spec addition, requiring additional paperwork.

Sherman, you owe me a response to my now-moldy proposed changes to
the UTF-8 charset.

The only controversial change would be the change in behavior in
malformed-utf8, which I can take out.


This remembers me at some thoughts. To be *exact* I think malformed should be returned for all codes, which are invalid in the regarding character set. So first validate for unmappable and second for invalid (=malformed). Doesn't cost any performance in looping mappable and valid characters, but little more effort after the loop is interrupted to form the right CoderResult.


-Ulf


Reply via email to