Re: Unicode script support in Regex and Character class

Ulf Zibis Mon, 26 Apr 2010 18:37:36 -0700

Am 24.04.2010 01:09, schrieb Xueming Shen:

I changed the data file "format" a bit, so now the overal uniName.dat is less than 88k (last version is 122+k), but the I can no long use cpLen as the capacity for the hashmap. I'm now using a hardcoded 20000 for 5.2.


Again, is 88k the compressed or the uncompressed size ?

-- Is it faster, first copying the whole date in a byte[], and then using ByteBuffer.getInt etc. against directly using DataInputStream methods? -- You could create a very long String with the whole data and then use subString for the individual strings which could share the same backing char[].


See attachment.

-- I don't think, it's a good idea, holding the whole data in memory, especiallly as String objects; Additionally the backing char[]'s occupy twice the space than a byte[] -- the big new byte[total] and later the huge amount of String objects could result in OOM error on small VM heap. -- as compromise, you could put the cp->nameOff pointers in a separate not-compressed data file, only hold this in memory, or access it via DirectByteBuffer, and read the string data from separate file only on request from Character.getName(int codePoint). As option, a PreHashMap could cache individual loaded strings. -- Anyway, having DirectByteBuffer access on deflated data would be a performace/footprint gain.
Sorry, I don't think I fully understand your points here.


See above, the others I try tomorrow.

-Ulf

CharacterName1.java
Description: java/

Re: Unicode script support in Regex and Character class

Reply via email to