On Sun, Aug 16, 2015 at 05:43:50PM +0300, Eli Zaretskii wrote: (i)
>> #if defined(WINDOWS) || defined(MSDOS) || defined(__CYGWIN__) >> /* insert some test for Windows */ >> #else >> ... code that uses getenv to test LC_ALL, LC_CTYPE, LANG ... >> #endif > I'm not sure this is the right way to fix this. First, relying on > UTF-8 locale to be announced in the environment is less portable than > it could be: it's better to call 'setlocale' with the 2nd argument > NULL to glean the same information. Then the ugly #ifdef above could > be dropped, and at least Cygwin will not be excluded from this > feature. I left the wget behaviour for MSDOS / Windows / Cygwin unchanged because I do not know anything about these platforms. It is quite possible that the #ifdef is unneeded. Are you saying that it in fact is needed when getenv() is used, but unneeded when setlocale() is used? And then what about LANG? (ii) > Moreover, even if the locale is not UTF-8, wget should attempt to > convert the file names to the current locale using iconv (which I > believe was what Tim suggested). This will DTRT in much more cases > than the above UTF-8 centric approach, IMO. Hmm. My own point of view is almost the opposite. In my life I have spent countless hours trying to repair the damage done by software that helpfully modified my data. I prefer my data as-is, unless I explicitly ask for conversion. I think Tim suggested something else (namely, just checking whether the filename was valid UTF-8), but never mind. The patch enlarges the number of cases where the original data is preserved. Yes, I am all in favour of enlarging that number of cases even further. This is only a first step. But in my eyes applying iconv would be a step back. It can be really tricky to decode the mojibake obtained by converting A to C, while the original really was in B. How do you guess the original character set? What should happen when iconv() returns EILSEQ? Andries
