Am 11.03.2010 20:38, schrieb Martin Buchholz:
Ulf, your changes would be easier to get in
if they were organized as mq patch files that
could be qimported into an existing mq repo.
To be honest, I never heard about mq. Can you point me to some docs please?
I've done that below, which includes a subset of
your own proposed changes:
http://cr.openjdk.java.net/~martin/webrevs/openjdk7/isSupplementaryCodePoint/
http://cr.openjdk.java.net/~martin/webrevs/openjdk7/public-isBMPCodePoint/
- Maybe better: "... using a single {...@code char}".
- Why don't you like using the new isBMPCodePoint() for isSupplementaryCodePoint() and
toUpperCaseCharArray() ?
- Same shift magic would enhance isISOControl(), isHighSurrogate(), isLowSurrogate(), in particular
if latter occur consecutive.
8-bit shift + compare would allow HotSpot to compile to smart 1-byte
immediate op-codes.
- Don't you think my notes on validity are worth to add. (or separate bug ?)
- Changing ch <= MAX_SURROGATE to ch < MAX_SURROGATE + 1 would allow HotSpot compiler to optimize 1
branch if those methods are used consecutive.
- And at last, I would like to make the constants complete (= adding
MAX_SUPPLEMENTARY_CODE_POINT).
http://cr.openjdk.java.net/~martin/webrevs/openjdk7/Character-warnings/
Remembers me that some months ago I prepared a beautified version of Character's source (things like
above, replacing <code> against {...@code}, indentation inconsistencies etc.) Would there be interest
to provide such a patch ?
http://cr.openjdk.java.net/~martin/webrevs/openjdk7/malformed-utf8/
In encodeBufferLoop() you could use putChar(), putInt() instead put(). Should
perform better.
Sherman (or Alan),
please review and/or file bugs for the above changes.
isBMPCodePoint is a spec addition, requiring additional paperwork.
Sherman, you owe me a response to my now-moldy proposed changes to
the UTF-8 charset.
The only controversial change would be the change in behavior in
malformed-utf8, which I can take out.
This remembers me at some thoughts. To be *exact* I think malformed should be returned for all
codes, which are invalid in the regarding character set. So first validate for unmappable and second
for invalid (=malformed). Doesn't cost any performance in looping mappable and valid characters, but
little more effort after the loop is interrupted to form the right CoderResult.
-Ulf