> From: Gavin Smith <[email protected]> > Date: Wed, 23 Feb 2022 19:31:52 +0000 > > Whatever we do, it should be concordant with TeX's filename handling. > I imagine that TeX (except possibly on MS-Windows) would just use the > bytes, so so should we.
AFAIK, TeX uses bytes everywhere. > In any case the cases we are dealing with a very rare here, but I just > don't see that the situation is very common where somebody works in > a non-UTF-8 locale, has all their filenames in this encoding, and > recodes any files they download from the Internet or extracted from a tar > file into that encoding. If the file names are non-ASCII, the _only_ reasonable way of downloading them is to recode their names. Otherwise, you will get garbled names at best, and at worst (on MS-Windows) can have the file names rejected by the OS, i.e. you will be unable to unpack the downloaded archive or to save locally the fetched file. > It seems much more likely to me that somebody would be using a > non-UTF-8 locale for whatever reason, and would download Texinfo > files with UTF-8 names without recoding the names, and still > expect to be able to build them. This might simply fail on MS-Windows, if the UTF-8 byte sequences include bytes that don't exist in the locale's encoding (a.k.a. "ANSI codepage"). It will definitely produced garbled file names, and might also break makeinfo. > E.g. - UTF-8 Texinfo file, processed under KOI-8 locale on Windows, > accessing filenames named with UTF-16 filenames on Windows filesystem. > Then the UTF-8 filenames would be encoded to KOI-8, and then some file > access layer would convert the KOI-8 to UTF-16 and find the filenames. > Is that how it works or am I way off? Are you describing what we will do in makeinfo, or are you describing how the current makeinfo, which doesn't re-encode file names, works? If the latter, then Windows file-related APIs will assume that the file names we pass to them (taken from the Texinfo source's @include or @image directives) are KOI-8 encoded, and will attempt to convert the UTF-8 byte sequences to UTF-16 as if they were KOI-8 encoded. The results will never be pretty, and if some byte doesn't exist in the KOI-8 encoding, the conversion will yield a question mark '?' or a space character; in the former case, the API call will likely fail because '?' is not allowed in Windows file names.
