Hi Bruno, thanks for your answer... after thinking about the ambiguous output of locale_charset() this might be an explanation:
Libidn2 (and the tests) use libunistring installed from homebrew while my direct call to locale_charset() is from gnulib. So my build correctly says UTF-8, but the homwbrew libunistring has been built on some unknown (OSX ?) system with their own version of locale_charset() returning ASCII. I said I get ? from characters > 255, but I didn't make sure. Maybe it is characters > 127. The bad thing is, I only experience this on a Travis CI build and so can't use gdb for single stepping. But an option is to build libunistring from sources in the CI and link/test with that. Regards, Tim Am Donnerstag, den 08.02.2018, 18:05 +0100 schrieb Bruno Haible: > Hi Tim, > > > locale_charset() returns with "UTF-8". > > That is as it should be on Mac OS X. > > > u8_strconv_to_locale() and u8_strconv_from_locale() seem not to > > work as > > expected: > > > > > > One problem seems to be that u8_strconv_to_locale() outputs > > decomposed > > characters, e.g. u8_strconv_to_locale(bücher.de) returns > > b"ucher.de. > > > > Hex/u32: > > > > Result: U+0062 U+0022 U+0075 U+0063 U+0068 U+0065 U+0072 U+002e > > U+0064 > > U+0065) > > > > Expected: U+0062 U+00fc U+0063 U+0068 U+0065 U+0072 U+002e U+0064 > > U+0065 > > This would indicate that locale_charset() returns "ASCII". > What happens then is that, because u8_strconv_to_locale invokes > u8_strconv_to_encoding, which invokes mem_iconveha with > transliterate=true, > which appends '//TRANSLIT' when invoking iconv_open. you get the > transliteration, e.g. from 'ü' to '"u'. > > > The second problem is that characters beyond 255 are translated > > into ? > > (U+003f). > > This would indicate that locale_charset() returns "ISO-8859-1". The > question marks then come from the transliteration, again. > > > Do you have any hints how to fix these problems ? > > I would compile without -O and with -ggdb, then single-step through > the code, > paying particular attention to the value of locale_charset() and to > the arguments of iconv_open(). > > > I would expect u8_strconv_to_locale() to work in a defined manner > > on > > UTF-8 locales > > That's certainly how it is intended to be. > > Bruno > >