On Mon, Aug 17, 2015 at 06:27:05PM +0300, Eli Zaretskii wrote: >> (ii) [about possibly using iconv] >> >>>> How do you guess the original character set? > > The answer is call "nl_langinfo (CODESET)".
I think we are not communicating. wget fetches a file from a remote machine. We know the filename (as a sequence of bytes). As far as I can see, there is no information on what character set (if any) that sequence of bytes might be in. In order to call iconv, I need a from-charset and a to-charset. I think your answer tells me how to find a reasonable to-charset. But the problem is how to find a from-charset. [Even when from-charset and to-charset are known there is a can of worms involved in conversion. But without from-charset one cannot even start thinking about conversion.] > > Unix filenames are not necessarily in any particular character set. > > They are sequences of bytes different from NUL and '/'. > > A different sequence of bytes is a different filename. > > As long as you treat them as UTF-8 encoded strings, ... I don't understand how one can treat sequences of bytes that are not valid UTF-8 as UTF-8 encoded strings. If all the world is UTF-8 then fine. But the remote machine is an unknown system. We just have a byte sequence, that is all. Andries
