Zack Weinberg wrote: > The //IGNORE and //TRANSLIT features are glibc / GNU libiconv > specific, but I would have thought that they were available in recent > Gentoo (they've been around since 2001 give or take).
I thought they would be present on *most* BSD and Linux available today... Uh. I know nothing about Gentoo, but I would have thought it was in Portage, but this doesn't seem to be it at all: http://gentoo-portage.com/dev-libs/libiconv > The real problem, though, is that an awful lot of non-GNUish systems > have iconv implementations that are useless. I mean _useless_. They > implement hardly any conversions at all. We have to have the "(list > of names for ASCII) <-> UTF8" shortcut for _correctness_, not just for > speed; real live systems don't support conversion between their own > locale's name for ASCII and UTF-8. *headdesk* Well, an iconv that doesn't even know how to make conversion *to* UTF8 is no good for us: we simply can't use it. An iconv that doesn't know about //IGNORE//TRANSLIT, OTOH, is good for the strict sanity conversion, but not good for the "best effort" print-to-the-terminal that I wired into "mtn log" (but other places would need that, too). I guess the "solution" could be to add an autoconf test for support of //IGNORE//TRANSLIT and, when not available, we can easily write a "quick&dirty" lossy conversion from UTF8 to either Latin1 or ASCII: #define UTF8_to_Latin1(u) ((u >= 256) ? '?' : (char)u) #define UTF8_to_ASCII(u) ((u >= 128) ? '?' : (char)u) Or maybe we could get the "transliteration table" right out of iconv... > It might be possible to bundle GNU libiconv, but I hesitate to > recommend that because I recall its being another Haible/Drepper build > system monstrosity like intl. IMHO we bundle already too much =) > Many systems have an iconv(1) command line utility that may be helpful > here. Uh, right, but writing a "known good UTF-8 string" escaped on the command line seems a bit trickier to me... no, not really. % echo "\xC2\xB7" | iconv -f UTF-8 -t CP1252//IGNORE//TRANSLIT · (that is, the correct and converted U+00B7 MIDDLE DOT) % echo "\xC2\xB7" | iconv -f UTF-8 -t ASCII//IGNORE//TRANSLIT . % echo "\xC3\x80" | iconv -f UTF-8 -t CP1252//IGNORE//TRANSLIT À (that is, correct U+00C0 LATIN CAPITAL LETTER A WITH GRAVE) % echo "\xC3\x80" | iconv -f UTF-8 -t ASCII//IGNORE//TRANSLIT `A Derek (or anyonelse with Gentoo), what do you get with these? Lapo _______________________________________________ Monotone-devel mailing list Monotone-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/monotone-devel