Re: Rewrite of IBM doublebyte charsets

Ulf Zibis Mon, 18 May 2009 17:37:54 -0700

Am 14.05.2009 23:38, Xueming Shen schrieb:

Ulf,
There are 3 goals of this re-writing
(1)shrink the storage size of the EUC_TW to a reasonable number
(2)move away from hard-coding the mapping data in the source file to amapping based-build time built approach
for easy maintenance in the future.
(3)no regression on decoding, encoding performance, decoder startupand resulting CoderResult when comparedto the existing implementation, with the exception of encoder startup(we need to build it from the b2c).
So far I'm happy to see all of them are archived. I'm not targeting tohave a perfect one (actually the purpose of
goal of (2) is to make it easier for future tuning.).


Yes, the map files are good start point for future tuning.

I would not try to argue which cr is more appropriate, unmappable ormalformed, it's hard to draw the line, somecodepage/charset set leave some codepoint for future use, private use,user-defined characters, you can't not makethe decision based on simply looking at the mapping table, you need tohave a standard on your desk to checksegment by segment, and in fact personally I don't think it reallymakes too much sense to distinguish these two. So
I would like to follow the existing behavior, is possible.

Mainly I agree with you and I guess, most users don't care about thisdifference, so the wouldn't run into compatibility problems, if onlychecking CoderResult#isError(), but I think, that users, who areinterested in this difference, they should get most accurate results,regardless, if former implementations have been malicious.



Hope, you are inspired by my suggestions from yesterday ;-)

-Ulf

Re: Rewrite of IBM doublebyte charsets

Reply via email to