Hello Ralph. Ralph Corderoy wrote in <[email protected]>: |> It is really terrible that all this is a black box; but how to do it |> right? In the end, with Unicode aka UTF-8 locales, we come to a point |> where it would really be doable, since now nl_langinfo(CODESET), or |> whatever input character set you pass to iconv(3), could actually be |> warped to the entire set of character classification functions, like |> [w]isspace() etc. But so you have black box iconv here, and LC |> derived classification possibilities there. You are condemned to live |> in the user's locale. | |Why not iconv(3) the input from the user's locale, the MIME part's |charset, etc., to UTF-8, work internally, and then iconv() again on the |way back out? I feel you're telling me above, but I don't quite get |your point.
Sure, convert to Unicode, work in Unicode, convert back, that is the way to go. It is still hard to do with POSIX let alone ISO. You need an UTF-8 locale you can actively select, POSIX/ISO functions do not support graphemes, and __STDC_ISO_10646__ is an option, so that you cannot simply code some tables on your own to fill the gaps, because looking at the wchar_t codepoints may not give you a Unicode "codepoint" (though maybe all do it like that so in practice you could make this a precondition). I had to look, but i think having the "WCHAR_T" iconv(3) target is absolutely non-portable also. So portable code has to scratch the portable ISO/POSIX functions and unroll its own stuff, nay? :) And finally not all character sets truly support roundtripping from/to Unicode, but in reality this should not hurt also. Really, the older i get the more i think that UTF-16 is not the worst decision regarding Unicode. Surrogate pairs have to be handled, but for UTF-8 you always have to live with multibyte anyway. --steffen | |Der Kragenbaer, The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
