On Tue, Aug 18, 2015 at 05:45:13PM +0300, Eli Zaretskii wrote: > > All this is about the local situation. One cannot know "the character set" > > of a filename because that concept does not exist in Unix. > > Of course, it exists. The _filesystem_ doesn't know it, but users do.
Usually, yes. > > About the remote situation even less is known. > > Assuming UTF-8 will go a long way towards resolving this. When this > is not so, we have the --remote-encoding switch. This is wget. The user is recursively downloading a file hierarchy. Only after downloading does it become clear what one has got. I download a collection of East Asian texts on some topic. Upon examination, part is in SJIS, part in Big5, part in EUC-JP, part in UTF-8. Since the downloaded stuff does not have a uniform character set, and surely the server is not going to specify character sets, any invocation of iconv will corrupt my data. When I get the unmodified data I look using browser or editor or xterm+luit for which character set setting I get readable text. > > It would be terrible if wget decided to use obscure heuristics to > > invent a remote character set and then invoke iconv. > > But what you suggest instead -- create a file name whose bytes are an > exact copy of the remote -- is just another heuristic. No. An exact copy allows me to decide what I have. Conversion leads to data loss. Andries
