Re: Fast String...

Ulf Zibis Wed, 25 Mar 2009 07:12:57 -0700


Am 25.03.2009 04:41, Xueming Shen schrieb:

Ulf Zibis wrote:
Am 25.03.2009 02:13, Xueming Shen schrieb:
reduce size is a good thing, that was my primary goal, to reduce thecharsets.jar to under 2M, anddoable if we can put the data outside the class file, that was whatI have done...the concern is thestartup time. one alternative is to pick this approach for thosecharsets that don't care the startup,
such as the ibm charsets and the one on solaris:-)
compared to stored the data in class file and out of the class, youcan still eliminate the c2b data(generated from b2c), the difference is the String constants storedin utf8 probably take 3 bytes
but 2 bytes in a ".dat" file....about 15%
Your generated charset classes have 2 K in average, my data fileshave 250 bytes in average (including aliases + historicalName, so youshould subtract 50..200 bytes for comparison).See:https://java-nio-charset-enhanced.dev.java.net/source/browse/java-nio-charset-enhanced/trunk/releases/nio_charset_M4.jar?rev=682&view=log
it's unfair:-) you put me totally in defensive position:-) Martin cantestify i started to sell this idea of extracting allmapping data into dat file and to have only one single base class toload in dat and construct the charset class
on the fly, 2 years ago:-) so i know how small it can be.

my 15% data is not for singlebyte, i'm talking about the doublebyte,


Ah, ok. This makes it clearer.

I totally agree with you, saving bytes only in singlebyte charsets isn'tmuch worth. But it was good exercise for me, to find out relevanttechniques.You may would wonder, how I can serve a coder for 256 2-byte chars witha 69 byte data file (e.g. koi8-u.dat), which also includes it's numerousnames.The trick is, that I share map data between charsets, if they aresimilar enough. This is done by my sun.nio.cs.CharsetStream class.

I would wonder, if there isn't heavy concordance between doublebytemaps, which could be shared. I have designed CharsetStream class to beextendible for doublebyte requirements. Additionally, I think it shouldbe possible to partly share mapping tables in memory, as the doublebyteb2c maps in general seem to be sliced.

The big problem is the lack in startup time, which for me seems to becaused by the dilly-dallying resource stream.


-Ulf

let me explain why i don't really care the singlebyte size,
we have probably 100 singlebyte charsets in our repository, assumeeach takes 2k, it's total of 200k of the 6M +(in stored mode)size of charsets.jar, even you can reduce the size to 0, it's 5% ofthe total size. yes, each bit counts, but sometime you have tobalance the advantage and disadvantage, so if we have to trade thestartup for the 5% reduce of total 6M charsets.jar, i wouldgive it a second thought. but it might be a totaly different story fordoublebyte, if you can cut the 6M in half (that was my goal),with relatively small startup regression, it might be something worthdoing.
Sherman

Re: Fast String...

Reply via email to