Re: Transliteration in wcrtomb()

Markus Kuhn Wed, 01 Nov 2000 01:24:46 -0800
Marcin 'Qrczak' Kowalczyk wrote on 2000-10-31 22:15 UTC:
> Mon, 30 Oct 2000 21:45:20 +0000, Markus Kuhn <[EMAIL PROTECTED]> pisze:
> 
> > In my eyes, "ü" -> "ue" is just as much a valid and useful multibyte
> > encoding as UTF-8.
> 
> It is possible to apply transliteration when iconv did not do it,
> but it's impossible to undo transliteration made by iconv. So please
> don't force users of iconv to have transliteration.

You probably lost context of what I was talking about. I was never
talking about iconv() or any other function not defined in ISO C99. I
was only talking about locale-dependent multibyte encoding as done by
wcrtomb() and all the other functions that are built on top of it
(printf("%ls"), wprintf(), etc.). Whether these functions should do
transliteration or not should in my opinion be the user's choice, via
selecting a locale that has or has not transliteration, as desired.
iconv() is *NOT* locale dependent and its semantics is not defined in
relation to wcrtomb() in any way and therefore it is completely
irrelevant here.

> When I want transliteration, I can easily do it myself (I've done it
> in Haskell); but when I need to know whether text can be unambiguously
> converted, I want to be able to get an error in other cases.

Yes, of course, iconv() will do all this and more for you and I never
ever said that this was a bad idea.

> > My proposal gives the programmer far more control and at the same
> > time far less special code that has to be added to applications.
> 
> How to specify transliterated and untransliterated conversions?

The user (not programmer!) does this by picking the right locale:

Use pl_PL.UTF-8           if you never ever want to see any transliteration.
Use pl_PL.ISO-8859-1      if you want to have non-Latin-1 characters
                          transliterated in a way appropriate for Polish readers
Use pl_PL.UTF-8@romanized if you want to have non-Latin characters
                          transliterated in a way appropriate for Polish readers

The choice is that of the user, NOT that of the application programmer.
If you want to be sure in the application to get a specific conversion
(e.g. for parsing a MIME text body), then use iconv(). If you want to be
sure that you get the external representation that the user wants
(whatever that is), then use wcrtomb(). I think, the external
representation selectable by the user should include transliteration and
it should be done in wcrtomb() if the user wants it.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/
Re: Transliteration in wcrtomb()

Reply via email to