UTF-8 support (fwd)

Dirk-Willem van Gulik 9 Nov 2001 18:13:27 -0000

We've got some of this in icconv/apr_xlate code. But it is far
from  complete.


I've got some old code floating (google for C3 API for a rough idea)
around which does

->      utf 6|7|8 <-> unicode <-> specific_charset(languange)

based on approximation code tables from the unicode standard. I.e. latin-1
'\xff' -> latin-3 'y' | latin-3 'ij' (depeding on language) '&' <-> 'et';
'\xdc' <-> 'u'/'eu'. '\xc6' <-> 'AE'. I.e. you can go from any charset
or from unicode to any other charset - and if char's are not available we
approxmiate it (occasionally based on language).

I'd be quite happ to donate it - and work it in.

However my feeling is that if we want to offer more than we do today it
*will* require the unicode tables to be linked in or shipped.

I.e. add half a megabyte to 2 megabyte to the footprint (depending on
charset tables) for a version which covers about the same range of
charsets as mac/windows does.

I am not conviced that that is good.

Dw

UTF-8 support (fwd)

Reply via email to