> Date: Tue, 25 Oct 2022 21:18:35 +0200 > From: [email protected] > Cc: [email protected], [email protected] > > > It should work if the document is indeed in the expected encoding. > > But if the file is actually encoded in something other, especially if > > the encoding is multibyte (like UTF-8), it will not work. > > Indeed, it is not reliable, but what would be the best default? It > seems to me that Windows adds additional possibilities for anything to > fail. However, on the issue of using the codepage to encode file names > in texi2any versus using the input file encoding, it does not seems to > me that Windows is special. If we use the input file encoding on other > platforms, assuming that the use case is converting manuals from > archives where all the files are similarily encoded, consistently with > manuals, it seems to me that Windows is not very special. It will > fail in some cases on Windows, but using the user codepage will decrease > even more the possibility that the result is correct (files with encoded > characters in their names are found). Are you still sure that using > the user current codepage is the best in this situation?
For the encoding of the document, @documentencoding should work on Windows as it does elsewhere. So I'm not sure why we use a different default. is that only for the case where there's no @documentencoding in the Texinfo source? If not, when will this default be used? The only part that is I think different on Windows is the encoding of file names, because Windows doesn't treat file names as opaque bytestreams. But anything that comes from a Texinfo source, even the name of an included file, should be interpreted according to @documentencoding. When accessing included files on Windows, we should re-encode the file names to the locale's encoding, because nothing else will work reliably. Is that what we do? > I can't imagine a situation where the file name would end up being > encoded in UTF-8 in Windows, even with a codepage in UTF-8 Windows doesn't yet allow users to set up the system to use UTF-8 as the default system codepage. This is only available on latest Windows versions, and only if the user turns on a special "for developers" feature. Even then it is not yet 100% reliable. So bottom line, UTF-8 cannot yet be used on Windows as the locale's codeset. > Even though we do not need to skip the test on Windows, I think that > it is better to skip it, as in case of multibyte coodepage, the file > created, supposed to be encoded in Latin1 will not be as expected, > and the test will not succeed, but not for the expected reasons and > does not test what it is supposed to test. I agree.
