Am 25.03.2009 04:41, Xueming Shen schrieb:
Ulf Zibis wrote:
Am 25.03.2009 02:13, Xueming Shen schrieb:
reduce size is a good thing, that was my primary goal, to reduce the
charsets.jar to under 2M, and
doable if we can put the data outside the class file, that was what
I have done...the concern is the
startup time. one alternative is to pick this approach for those
charsets that don't care the startup,
such as the ibm charsets and the one on solaris:-)
compared to stored the data in class file and out of the class, you
can still eliminate the c2b data
(generated from b2c), the difference is the String constants stored
in utf8 probably take 3 bytes
but 2 bytes in a ".dat" file....about 15%
Your generated charset classes have 2 K in average, my data files
have 250 bytes in average (including aliases + historicalName, so you
should subtract 50..200 bytes for comparison).
See:
https://java-nio-charset-enhanced.dev.java.net/source/browse/java-nio-charset-enhanced/trunk/releases/nio_charset_M4.jar?rev=682&view=log
it's unfair:-) you put me totally in defensive position:-) Martin can
testify i started to sell this idea of extracting all
mapping data into dat file and to have only one single base class to
load in dat and construct the charset class
on the fly, 2 years ago:-) so i know how small it can be.
my 15% data is not for singlebyte, i'm talking about the doublebyte,
Ah, ok. This makes it clearer.
I totally agree with you, saving bytes only in singlebyte charsets isn't
much worth. But it was good exercise for me, to find out relevant
techniques.
You may would wonder, how I can serve a coder for 256 2-byte chars with
a 69 byte data file (e.g. koi8-u.dat), which also includes it's numerous
names.
The trick is, that I share map data between charsets, if they are
similar enough. This is done by my sun.nio.cs.CharsetStream class.
I would wonder, if there isn't heavy concordance between doublebyte
maps, which could be shared. I have designed CharsetStream class to be
extendible for doublebyte requirements. Additionally, I think it should
be possible to partly share mapping tables in memory, as the doublebyte
b2c maps in general seem to be sliced.
The big problem is the lack in startup time, which for me seems to be
caused by the dilly-dallying resource stream.
-Ulf
let me explain why i don't really care the singlebyte size,
we have probably 100 singlebyte charsets in our repository, assume
each takes 2k, it's total of 200k of the 6M +(in stored mode)
size of charsets.jar, even you can reduce the size to 0, it's 5% of
the total size. yes, each bit counts, but sometime you have to
balance the advantage and disadvantage, so if we have to trade the
startup for the 5% reduce of total 6M charsets.jar, i would
give it a second thought. but it might be a totaly different story for
doublebyte, if you can cut the 6M in half (that was my goal),
with relatively small startup regression, it might be something worth
doing.
Sherman