Am 14.05.2009 23:38, Xueming Shen schrieb:
Ulf,
There are 3 goals of this re-writing
(1)shrink the storage size of the EUC_TW to a reasonable number
(2)move away from hard-coding the mapping data in the source file to a
mapping based-build time built approach
for easy maintenance in the future.
(3)no regression on decoding, encoding performance, decoder startup
and resulting CoderResult when compared
to the existing implementation, with the exception of encoder startup
(we need to build it from the b2c).
So far I'm happy to see all of them are archived. I'm not targeting to
have a perfect one (actually the purpose of
goal of (2) is to make it easier for future tuning.).
Yes, the map files are good start point for future tuning.
I would not try to argue which cr is more appropriate, unmappable or
malformed, it's hard to draw the line, some
codepage/charset set leave some codepoint for future use, private use,
user-defined characters, you can't not make
the decision based on simply looking at the mapping table, you need to
have a standard on your desk to check
segment by segment, and in fact personally I don't think it really
makes too much sense to distinguish these two. So
I would like to follow the existing behavior, is possible.
Mainly I agree with you and I guess, most users don't care about this
difference, so the wouldn't run into compatibility problems, if only
checking CoderResult#isError(), but I think, that users, who are
interested in this difference, they should get most accurate results,
regardless, if former implementations have been malicious.
Hope, you are inspired by my suggestions from yesterday ;-)
-Ulf