Den 17. okt. 2017 19:50, skrev Guenter Milde:
TODO: find out which encoding is used for the arguments by CMake
(maybe we need the locale encoding) and eventually adapt the argument
parsing:
arg = arg.decode('UTF-8') # support non-ASCII characters in arguments
Is that sort of thing necessary?
Arguments are often file/pathnames, right? Anyway, anything that *is* a
pathname, should not be 'decoded' or otherwise altered. A filename is 'a
string of bytes', and that should work. It *will* work with the
underlying filesystem - which may have folders encoded in any weird
encoding not matching LANG or whatever.
Realistic example:
Linux-based file server. Some users use windows (and whatever encoding
they have.) Some uses linux with utf-8, and some uses linux with some
iso8859-x encoding. All these people can see their own files named
correctly, precisely because the server don't care about filename
encoding. They may see some garbage in other people's filenames - but
don't care. Each works mostly with their own. A garbled name only has a
few wrong characters, because A-Z covers 90% even for those who needs
more than ascii.
Still, you will sometimes have to work with lyx in a shared folder
created & named by one of the others. The language is not english, so
yes - the folder name contains non-ascii characters, possibly in an
encoding you don't use.
Failing to display pathnames correctly under such circumstances is OK.
(The company should standardize on some encoding, if they want correct
display in all cases!)
Failing to run scripts, or failure to find files after a 'decoding'
mangled the pathname is NOT OK.
While it may be necessary to encode/decode the contents of files, this
should not be done with pathnames. A pathname is the real name used by
the underlying filesystem. Any change, and it won't match reality
anymore. It can be changed for display purposes, but note that
conversion may very well be impossible. At worst, you have
folderone/foldertwo/filename where "folderone" is in utf-8 and
"foldertwo" is in iso8859 - both using non-ascii characters. There is no
need to break because of that.
Conversion errors had better not stop us. Ugly display is something we
can live with, software stopping with an error is not. And we can of
course not prevent people from using a perfectly normal folder name in
whatever language they use.