Kornel Benko wrote:

> Setting 'wrong' lang environment causes lyx to use different encoding for
> filenames.
> 
> Setting
> export LANG="en_IE@euro"
> 
> Now, reading the file "Testoübernahme.lyx" which needs conversion leads to
> this log snippet:
> 
> support/TempFile.cpp (35): Temporary file in
> 
/home/kornel/.lyx2/tmp/lyx_tmpdir.dkXWbiwl8040/Buffer_convertLyXFormatXXXXXX.lyx
> support/TempFile.cpp (38): Temporary file
> 
`/home/kornel/.lyx2/tmp/lyx_tmpdir.dkXWbiwl8040/Buffer_convertLyXFormatAS8040.lyx'
> created. Buffer.cpp (1297): Running 'python -tt
> "/usr/local/share/lyx2.3/lyx2lyx/lyx2lyx" -t 509 -o
> 
"/home/kornel/.lyx2/tmp/lyx_tmpdir.dkXWbiwl8040/Buffer_convertLyXFormatAS8040.lyx"
> "/usr2/kornel/lyx/privat/Briefe-Edgar/Testoübernahme.lyx"' usage: lyx2lyx
> [options] [file] lyx2lyx: error: argument input: invalid cmd_arg value:
> '/usr2/kornel/lyx/privat/Briefe-Edgar/Testo\xc3\xbcbernahme.lyx'
> 
> Everything is OK, if using e.g. LANG="en_IE.utf8".
> 
> From my POV, encoding of file-names should not depend on locales.

TL;DR: The current behaviour is probably correct, or QFile::encodeName() has 
a bug.

Unfortunately this is complicated, but I'll try to explain. First let's have 
a look how file names are stored in the file system. This depends of course 
on the file system type. Both NTFS on windows and HFS+ on OS X store file 
names encoded in utf-16 (see https://en.wikipedia.org/wiki/NTFS and 
https://en.wikipedia.org/wiki/HFS_Plus). This is simple and reliable, any 
program or operating system that deals with the file system directly (e.g. 
when mounting it on a different machine), knows how to interpret file names 
and can present them to the user in the correct way.

For other file systems such as FAT or the typical linux file systems (e.g. 
ext3) the situation is a mess. ext3 and relatives do not specify in which 
encoding a file name is stored. They only know bytes (see e.g. 
http://unix.stackexchange.com/questions/39175/understanding-unix-file-name-encoding).
 The interpretation of the bytes is left to the user space, and 
here comes the locale into account: I the locale is set to en_IE@euro, and 
you create a file, the encoding of the file name will be iso_8859-15. If you 
do the same while the locale is set to en_IE, the encoding of the file name 
will be utf8. This used to cause big trouble in the transition period from 
fixed width 8bit locales to utf8, when people hand file names with non-ascii 
letters, and used the old hard disk on a machine with a newer Linux, and 
suddenly all file names looked broken. Therefore utilities like convmv were 
invented, and when mounting FAT file systems on linux the codepage= and 
iocharset= options can be used.

What happens in your case is the following: LyX does _not_ use the 
iso_8859-15 encoding when calling lyx2lyx. This can be seen from the error 
message, if it would use iso_8859-15 then the ü would not be encoded in two 
bytes. Here we might have a bug in QFile::encodeName() that is used 
internally, but I rather suspect that you still have some LC_* variables set 
to use an utf8-encoding. Unfortunately the qt documentation is rather 
unspecific about how exactly the "local 8-bit encoding determined by the 
user's locale" (which is used by QFile::encodeName()) is determined, one 
would have to read the sources.

Assuming that LyX would really pass the file name encoded in iso_8859-15 to 
lyx2lyx, then the commandline argument decoding in lyx2lyx would work (I did 
spend some evenings to understand how this works and to implement the 
current parsing interface in lyx2lyx). However, when lyx2lyx would try to 
read the input file it would not work. The reason for this is that your 
original file was created with an active utf8 locale, but the current locale 
tells lyx2lyx to use iso_8859-15 for decoding the file name. It would work 
if you called convmv to convert the file name in the file system to 
iso_8859-15 before starting LyX.

Encoding commandline arguments of programs according to the currently active 
locale is standard among all operating systems (see e.g. 
http://stackoverflow.com/questions/5408730/what-is-the-encoding-of-argv). So 
for the case that the user calls lyx2lyx directly in a terminal, or from a 
different program than LyX, the current lyx2lyx behaviour is correct (I 
tested that using different encodings). If you want to test this as well you 
need to ensure that you set all environment variables that are currently set 
to the wanted locale. These may be LANG, LANGUAGE and LC_*. When using a 
terminal emulator from X, you also need to change the encoding of the 
terminal emulator, because this determines how the keyboard input that is 
fed to the shell is encoded.

If called from LyX we could simply decide to use utf8 for lyx2lyx 
commandline arguments. Of course this would have to be specified by a 
special commandline parameter, so that non-LyX usage of lyx2lyx does not 
break. I do not see any real advantage when doing this. We would not need 
the ugly FileName::toSafeFilesystemEncoding() on windows, and we would be 
able to encode every file for the lyx2lyx commandline, but on linux, if the 
file name is not encodable by the current locale, lyx2lyx would fail when 
trying to open the file.


Georg

Reply via email to