Den 17. okt. 2017 19:50, skrev Guenter Milde:

TODO: find out which encoding is used for the arguments by CMake
(maybe we need the locale encoding) and eventually adapt the argument
parsing:

       arg = arg.decode('UTF-8') # support non-ASCII characters in arguments

Is that sort of thing necessary?
Arguments are often file/pathnames, right?  Anyway, anything that *is* a pathname, should not be 'decoded' or otherwise altered. A filename is 'a string of bytes', and that should work. It *will* work with the underlying filesystem - which may have folders encoded in any weird encoding not matching LANG or whatever.

Realistic example:
Linux-based file server. Some users use windows (and whatever encoding they have.) Some uses linux with utf-8, and some uses linux with some iso8859-x encoding. All these people can see their own files named correctly, precisely because the server don't care about filename encoding. They may see some garbage in other people's filenames - but don't care. Each works mostly with their own. A garbled name only has a few wrong characters, because A-Z covers 90% even for those who needs more than ascii.

Still, you will sometimes have to work with lyx in a shared folder created & named by one of the others. The language is not english, so yes - the folder name contains non-ascii characters, possibly in an encoding you don't use.

Failing to display pathnames correctly under such circumstances is OK. (The company should standardize on some encoding, if they want correct display in all cases!)

Failing to run scripts, or failure to find files after a 'decoding' mangled the pathname is NOT OK.


While it may be necessary to encode/decode the contents of files, this should not be done with pathnames. A pathname is the real name used by the underlying filesystem. Any change, and it won't match reality anymore. It can be changed for display purposes, but note that conversion may very well be impossible. At worst, you have folderone/foldertwo/filename where "folderone" is in utf-8 and "foldertwo" is in iso8859 - both using non-ascii characters. There is no need to break because of that.

Conversion errors had better not stop us. Ugly display is something we can live with, software stopping with an error is not. And we can of course not prevent people from using a perfectly normal folder name in whatever language they use.

Reply via email to