> From: Gavin Smith <[email protected]> > Date: Sun, 20 Feb 2022 14:44:13 +0000 > Cc: [email protected], [email protected], [email protected] > > On Sun, Feb 20, 2022 at 03:06:57PM +0200, Eli Zaretskii wrote: > > > This means that any non-ASCII characters in a filename in a Texinfo source > > > file are sought in the filesystem as the corresponding UTF-8 sequences. > > > > This will not work on Windows. > > I can see that there could be an issue if files are copied onto a > Windows filesystem, or extracted from a tar file. This > would lead the byte sequences representing the filenames to change.
It can cause problems with any system. Although on Posix hosts most people use UTF-8, but some still don't. > Do you know if TeX distributions for Windows do any handling of filename > encodings? I never used non-ASCII variant of TeX (XeTeX, I guess?) on Windows, so I don't know, sorry. > A file could be in UTF-8 but need to refer to file names that > are in UTF-16 on the filesystem. Would this work? Not unless we change the encoding. > > > A more thorough fix would obey @documentencoding and convert back to the > > > original encoding, to retrieve the bytes that were present in the source > > > file in case the file was not in UTF-8. I think it would be the most > > > correct to always use the exact bytes that were in the source file as the > > > name of the file (I assume that is what TeX would do). > > > > This assumes that the file name is encoded the same as the Texinfo > > source. But that assumption is only true on the system where the > > Texinfo file was written, and even there it could be false. > > I would only favour having special handing for Windows. On GNU/Linux we > should assume that the byte sequence in the Texinfo file matches the > filename exactly. This is the easiest behaviour to understand and what > TeX would do. But that won't work on systems whose locale's codeset is not UTF-8. > > The only thorough solution, IMO, is to assume the file names are > > encoded in the filesystem as specified by the locale's codeset. That, > > too, can be false, but at least in the absolute majority of use cases > > it will be true. The only better solution is to let the user specify > > the file-name encoding. > > The locale codeset could very easily be incorrect. Suppose somebody sets > a Latin-1 locale, should they then be unable to build Texinfo manuals > with non-ASCII UTF-8 filenames? They will see garbled file names. You can try this yourself. E.g., Emacs lets you control the file-name encoding, so you could create a file with Latin-1 encoded name on a system whose locale's codeset is UTF-8. Then ask 'ls' or some GUI file manager to display the file's name.
