On Thu, Feb 24, 2022 at 02:33:11PM +0100, Patrice Dumas wrote: > It fixes the NonXS parser (I modified where it is done, such as to do it > it before locate_include_file but kept your code), but not for the XS > parser. In the XS parser, the @include file name is converted to utf-8 > upon reading. If the file name is encoded in another encoding on the > filesystem it won't be found (I tested, it is indeed the case). > > To do something similar to the NonXS parser, one would need, maybe > in Texinfo/XS/parsetexi/end_line.c in end_line_misc_line around line > 1428, instead of fullpath = locate_include_file (text); text should be > converted to the @documentencoding unless it is utf-8 or ascii.
Done in 46732a3290. I haven't tested this code very much, just by running the test suite. > > In any case the cases we are dealing with a very rare here, but I just > > don't see that the situation is very common where somebody works in > > a non-UTF-8 locale, has all their filenames in this encoding, and > > recodes any files they download from the Internet or extracted from a tar > > file into that encoding. I've no insight into what use case we would be > > supporting by using the kocale encoding to interpret any filenames. > > It could also be the reverse, somebody works in an UTF-8 locale > with a manual in a 8 bit locale and recodes the file names to > utf-8. Good point. > > It seems much more likely to me that somebody would be using a > > non-UTF-8 locale for whatever reason, and would download Texinfo > > files with UTF-8 names without recoding the names, and still > > expect to be able to build them. (Even if they can't type the > > names in, it may get build with Makefile rules.) > > To me both are possible. Speaking for GNU/Linux, some years ago when > there were still 8 bytes locales, it would have been reasonable to > assume that people would process differently encoded manuals and recode > file names without changing the manual itself (either 8 bytes encoded > manuals in utf8 locale or utf8 manual in 8 bytes locale). Today this is > less likely to happen while your scenario is more likely to happen as > all the manuals should be converted to utf-8, all the locales should be > utf8 and more file names should be in utf8, even on 8 bytes locales. > > > Some filtering with a customization variable may be necessary for > > unusual operating systems and/or filesystems. > > Yes, I'll add that after if you don't. I think that it will need to be > obeyed by the XS parser too, in the same way as the @include file names > should be converted to the documentencoding from utf-8. The customization variable could be the name of an encoding to convert filenames to, or it could be an on/off variable to use the encoding from the locale. I guess that the latter would be sufficient. I'm happy if you implement this, although I doubt it is urgent. It should be off for default on all systems except MS-Windows. I think it would be fairly simple to implement in the XS parser, if it is done in the Perl code - it would just need to get the name of the filename encoding.
