Re: Unicode script support in Regex and Character class

Ulf Zibis Mon, 10 May 2010 14:57:49 -0700

Some additional thoughts:

- out.writeShort((short)(num & 0xffff)); ---short form--->out.writeShort((short)num);

- use Arrays.binarySearch() in Character.UnicodeBlock.of().

- "if (notFirst)" could be saved if you would first append the firstword to sb outside the while loop.- StringBuilder sb could be initialized by the maximum name length (=83)to avoid resizing;- we could reuse the same Stringbuilder for multiple invokations ofCharacter.getName(cp)?-- make CharacterName.get(cp) instance method and save CharacterNameobject as ThreadLocal from Character.getName(cp).

-- synchronize Character.getName(cp).

- Instead using StringBuilder we could use ByteBuffer, omit the char[]and build the final String by new String(bb.toArray(), "ASCII").

-- saves the twice bigger char[] for the pool.
-- I imagine, ByteBuffer would perform better than StringBuilder.

- save UnicodeBlocks, BlockStarts and scriptStarts in a file insteadstatically in classfile.-- e.g. init of scriptStarts is a big waste of byte code (7/11 bytes pershort/integer entry).


Am 08.05.2010 23:49, schrieb Xueming Shen:

Hi,

The API  proposals for Unicode script support below have been approved.

6945564: Unicode script support in Character class
6948903: Make Unicode scripts available for use in regular expressions
(2)Testing result suggests there is not too much runtime benefit ofkeeping a huge stringdata pool + an access hashmap for getName() implementation. The latestimplementation nowtakes Ulf's suggestion to keep a relatively small byte[] pool andgenerate the names at runtime.(there is "even smaller" implementation, which consumes about 300Kmemory at runtime
http://cr.openjdk.java.net/~sherman/script/webrev.00/
but it has a "scalability" problem need to address when string poolgrows beyond 64k and it
is little slow)


I'm investigating in that.
For 1st, my string pool has size of only 35243.

-Ulf

Re: Unicode script support in Regex and Character class

Reply via email to