Am 21.05.2009 00:22, Xueming Shen schrieb:
Ulf Zibis wrote:
(6) Unload b2cStr from memory after startup:
- outsource b2cStr to additional class file like EUC_TW approach
- set b2cStr = null after startup (remove final modifier)
Benefit[6]: avoid 100 % superfluous memory-footprint
I doubt it really saves something real, since the "class" should still
keep its copy somewhere...and
I will need it for c2b (now I'm "delaying" the c2b init)
I always thought, setting an object to null after use, it would be
automatically GCed. Am I wrong?
... but we can do c2binit from b2c[][] instead from b2cstr[], so why
saving it.
(7) Avoid copying b2cStr to b2c:
(String#charAt() is fast as char[] access)
Benefit[7]: increase startup performance for decoder
I tried again last night. char[][] is much faster than the String[]
version in both client
and server vm. So keep it asis. (this was actually I switched from
String[] to char[][])
I'm surprised, because I had in mind from older benchmarks, that
char_array[index] had same speed than str.charAt(index) after
optimization from hotspot.
I also had this results here:
https://java-nio-charset-enhanced.dev.java.net/source/browse/java-nio-charset-enhanced/branches/array_io_string/src/sun/nio/cs/SingleByteFastDecoder.java?rev=&view=markup
(12) Get rid of sun.io package dependency:
https://java-nio-charset-enhanced.dev.java.net/source/browse/java-nio-charset-enhanced/tags/milestone2/src/sun/io/
Benefit[13]: avoid superfluous disk-footprint
Benefit[14]: save maintenance of sun.io converters
Disadvantage[1]: published under JRL (waiting for launch of
OpenJDK-7 project "charset-enhancement") ;-)
This is not something about engineering. It's about license, policy...
So hopefully we would have OpenJDK7 project "charset-enhancement" soon.
(17) Decoder#decodeArrayLoop: shortcut for single byte only:
int sr = src.remaining();
int sl = sp + sr;
int dr = dst.remaining();
int dl = dp + dr;
// single byte only loop
int slSB = sp + sr < dr ? sr : dr;
while (sp < slSB) {
char c = b2cSB[sa[sp] && 0xff];
if (c == UNMAPPABLE_DECODING)
break;
da[dp++] = c;
sp++;
}
Same for Encoder#encodeArrayLoop
(18) Decoder_EBCDIC: boolean singlebyteState:
if (singlebyteState)
...
(19) Decoder_EBCDIC: decode single byte first:
if (singlebyteState)
c = b2cSB[b1];
if (c == UNMAPPABLE_DECODING) {
...
}
Benefit[20]: should be faster
Not like when we dealing with singlebyte charsets. For doublebyte
charsets
the priority should be given to doublebyte codepoints, if possible.
Not single
byte codepoints.
- I am in assumption that having singlebyte-only input is common use
case. Am I wrong in case of EBCDIC ?
- This hack doesn't make processing of "normal" mixed input slower after
escaping to "normal" while(...)-loop.
- This hack was copied from UTF-8 coder, where ASCII-only input is
common use case.
*** Encoder-Suggestions:
(21) join *.nr to *.c2b files (25->000a becomes 000a->fffd):
Benefit[21]: reduce no. of files
Benefit[22]: simplifies initC2B() (avoids 2 loops)
In theory you can do some magic to "join" .nr into .c2b. The price
might be more complicated
logic depends on the codepoints. You may end up doing some table
lookup for each codepoint
in b2c when processing.
This "magic" should be done in GenerateDBCS.java, so the price must only
be paid once while building the JDK. But to be honest, it could be done
by hand, for those few mapping pairs. See my single-byte IBMxxx mappings
here:
https://java-nio-charset-enhanced.dev.java.net/source/browse/java-nio-charset-enhanced/trunk/make/tools/CharsetMapping/ext/
... and don't forget, it prevents from copying the whole b2c.
And big thanks for all the suggestions.
Thanks for your appreciation. :-)
-Ulf