Re: Codereview request for 6653797: Reimplement JDK charset repository charsets.jar

Xueming Shen Mon, 16 Jul 2012 10:10:19 -0700


On 7/16/2012 9:57 AM, Ulf Zibis wrote:

Hi Sherman,
as I just said for 7183053, I can't look in the details at the moment,as I do not have suitable environment installed at the moment.
All I can see, looks reasonable.
Regarding part 4 of bug 6653797, there is still existing adaptor frommy side, if desired.

The sun.io has been removed. That will be an alternative if we hear anycomplain.


Thanks!
-Sherman

Just one comment: I think it should be possible to share the mappingdata partly across charsets, so the charsets.jar would be decreasedagain more?
-Ulf


Am 16.07.2012 00:12, schrieb Xueming Shen:
Hi
This changeset includes the migration of our JIS0201/0208/0212 basedsingle/double-byte charsets to the new mapping based implementation. This istheleft-over of the effort we put in JDK7 to re-implement most of ourcharsets incharsets.jar to (1)have better performance (2) small storage footprint and (3)
ease the maintenance burden.

http://cr.openjdk.java.net/~sherman/6653797/webrev/

Notes of the implementation:
(1) jis0201/0208/0212 and their variants are now generated from themapping table
during the build time. (See those new .map *.nr and *.c2b tables)
(2) EUC_JP/LINUX_OPEN, SJIS, PCK, ISO2022_JP and its variants are nowusing these
new jis0201/02080212 charsets.
(3) Those in red (in webrev) are the old implementation, since nocharset uses them
anymore, I removed them from the repository)
(4) There are two approaches for PCK and SJIS. PCK.java_v1 andSJIS.java_v1 are theone that follows the old implementation, which decode/encodes base onthejis0201/0208 (and the variants) mapping via Ken's algorithm. This isknown to beslow and buggy (the algothrim does not take care of illegal sjis cp,see #6653797
and http://cr.openjdk.java.net/~sherman/6653797/Sjis2Jis.java)
So in this charset, I generated the direct mapping tables for sjisand pck and usethe "general" DoubleByte base class for these two charsets. Thisresults in muchfaster decoding/encoding and correct mapping for all code points. Thedownsideof this approach is that it adds about 50k uncompressed side to thecharsets.jar.But given this change actually reduces about 300K from the rt.jar, westill geta net 250K, so I decided to go with this approach for betterperformance.
It appears to be lots of files (80+) in the webrev, but that numberincludes theremoved old implementation and the tests I put in to guarantee theidenticalde/encoding result from the old and new implementations (those OLD...testcases), the change is actually not that big:-) So please help review.I can then
put this multi-year efforts into rest.

-Sherman

Re: Codereview request for 6653797: Reimplement JDK charset repository charsets.jar

Reply via email to