On Thu, 22 Dec 2011 15:33:35 +0100, L. David Baron <dba...@dbaron.org>
wrote:
This seems like one of those areas where it may be substantially
easier to figure out what implementations do by looking at their
code than by reverse-engineering, at least for the implementations
whose code is available publicly.
Gecko's code lives in
http://mxr.mozilla.org/mozilla-central/source/intl/uconv/ . There
are others who know it substantially better, but I or others could
probably answer questions you have about how it works and how to
understand it.
I'm not the right person for pointers to other implementations,
though.
Thanks, I'm doing a combination of code inspection, reverse engineering
(especially for edge cases), and applying some lessons we learned (e.g.
non-greedy error handling).
So far I defined the to Unicode algorithms for hz-gb-2312, euc-jp,
iso-2022-jp, and shift_jis.
http://dvcs.w3.org/hg/encoding/raw-file/tip/Overview.html
Feedback welcome!
--
Anne van Kesteren
http://annevankesteren.nl/