Re: Unicode script support in Regex and Character class

Ulf Zibis Tue, 11 May 2010 07:13:42 -0700

SOME of my comments below ARE ment forhttp://cr.openjdk.java.net/~sherman/6945564_6948903/webrev


I marked the others. ;-)


-Ulf


Am 11.05.2010 02:05, schrieb Xueming Shen:

Ulf,
My apology for distracting you to that "smaller size alternative", asI said in my previous email
please only "review" the bits at
http://cr.openjdk.java.net/~sherman/6945564_6948903/webrev

It's fine if you are interested in the stuff I experimented at
http://cr.openjdk.java.net/~sherman/script/webrev.00
but please keep it separated from the code I'm proposing to putback.

-Sherman


Ulf Zibis wrote:
Some additional thoughts:
*EXPERIMENTAL* - out.writeShort((short)(num & 0xffff)); ---shortform---> out.writeShort((short)num);
- use Arrays.binarySearch() in Character.UnicodeBlock.of().
*EXPERIMENTAL* - "if (notFirst)" could be saved if you would firstappend the first word to sb outside the while loop.*EXPERIMENTAL* - StringBuilder sb could be initialized by themaximum name length (=83) to avoid resizing;*EXPERIMENTAL* - we could reuse the same Stringbuilder for multipleinvokations of Character.getName(cp)?*EXPERIMENTAL* -- make CharacterName.get(cp) instance method andsave CharacterName object as ThreadLocal from Character.getName(cp).
*EXPERIMENTAL*  -- synchronize Character.getName(cp).
*EXPERIMENTAL* - Instead using StringBuilder we could useByteBuffer, omit the char[] and build the final String by newString(bb.toArray(), "ASCII").
*EXPERIMENTAL*  -- saves the twice bigger char[] for the pool.
*EXPERIMENTAL* -- I imagine, ByteBuffer would perform better thanStringBuilder.- save UnicodeBlocks, BlockStarts and scriptStarts in a file insteadstatically in classfile.-- e.g. init of scriptStarts is a big waste of byte code (7/11 bytesper short/integer entry).
Am 08.05.2010 23:49, schrieb Xueming Shen:
Hi,

The API  proposals for Unicode script support below have been approved.

6945564: Unicode script support in Character class
6948903: Make Unicode scripts available for use in regular expressions
(2)Testing result suggests there is not too much runtime benefit ofkeeping a huge stringdata pool + an access hashmap for getName() implementation. Thelatest implementation nowtakes Ulf's suggestion to keep a relatively small byte[] pool andgenerate the names at runtime.(there is "even smaller" implementation, which consumes about 300Kmemory at runtime
http://cr.openjdk.java.net/~sherman/script/webrev.00/
but it has a "scalability" problem need to address when string poolgrows beyond 64k and it
is little slow)
I'm investigating in that.
For 1st, my string pool has size of only 35243.

-Ulf

Re: Unicode script support in Regex and Character class

Reply via email to