Hi,

The API  proposals for Unicode script support below have been approved.

6945564: Unicode script support in Character class
6948903: Make Unicode scripts available for use in regular expressions

Here is the final webrev ready for push.

http://cr.openjdk.java.net/~sherman/6945564_6948903/webrev

(1) It is suggested that the access to the UnicodeScript and UnicodeBlock's ranges data might be desirable for certain use scenario, for example our regex engine might benefit from such access to avoid runime binary search for each/every matching operation. I'm considering to add a pair of UnicodeScript.is(codePoint) & UnicdeBlock.is(codePoint) to address this issue, but prefer to handle it in a separate RFE (it seems like it's a no-brainer for UnicodeBlock, but tricky for the UncodeScript, given its wide ranges of lots scripts, any suggestion? or
alternative?).

(2)Testing result suggests there is not too much runtime benefit of keeping a huge string data pool + an access hashmap for getName() implementation. The latest implementation now takes Ulf's suggestion to keep a relatively small byte[] pool and generate the names at runtime. (there is "even smaller" implementation, which consumes about 300K memory at runtime
http://cr.openjdk.java.net/~sherman/script/webrev.00/
but it has a "scalability" problem need to address when string pool grows beyond 64k and it
is little slow)

(3)The UnicodeScript implementation is built on Unicode 5.2 Script.txt. The rest of the Character class however is still using the previous version waiting for Yuka's Unicode 5.2 RFE to get
back in.

(4)The previous webrev can be found at http://cr.openjdk.java.net/~sherman/scripte

Please help review.

Thanks,
-Sherman


Reply via email to