Re: Unicode script support in Regex and Character class

Xueming Shen Mon, 26 Apr 2010 21:28:53 -0700

Ulf Zibis wrote:

Am 24.04.2010 01:09, schrieb Xueming Shen:
I changed the data file "format" a bit, so now the overal uniName.datis less than 88k (last version is 122+k), butthe I can no long use cpLen as the capacity for the hashmap. I'm nowusing a hardcoded 20000 for 5.2.
Again, is 88k the compressed or the uncompressed size ?

Yes, it's the size of compressed data. Your smart "save one more byte"suggestion will save

400+byte, a tiny 0.5%, unfortunately:-)

-- Is it faster, first copying the whole date in a byte[], and thenusing ByteBuffer.getInt etc. against directly using DataInputStreammethods?

The current impl use neither ByteBuffer nor DataInputStream now, so nocompare here.Yes, to use DataInputStream will definitely makes code look better (nomore those "ugly"shifts), but it also will slow down thing a little since it adds onemore layer. But speed

may not really a concern here.

-- You could create a very long String with the whole data and thenuse subString for the individual strings which could share the samebacking char[].

The disadvantage of using a big buffer String to hold everything thenhave the individual names to substringfrom it is that it might simply break the softreference logic here. Thebig char[] will never been gc-ed aslong as there is still one single name object (substring-ed from it) isstill walking around in system somewhere.

I don't think the vm/gc is that smart, isn't it?

But this will definitely be faster, given the burden of creating aString from bytes (we put in the optimization

earlier, so this operation should be faster now compared to 6u).

-Sherman

Re: Unicode script support in Regex and Character class

Reply via email to