> Date: Sun, 20 Feb 2022 14:28:23 +0100 > From: Patrice Dumas <[email protected]> > > On Sun, Feb 20, 2022 at 01:09:06PM +0000, Gavin Smith wrote: > > > > My thought was that the argument to -I could have been any sequence of > > bytes, > > not necessarily correct UTF-8. It would be wrong then to attempt any > > encoding or decoding to a string formed from such an argument. > > Indeed, that must be what is happening here. I think that it is not > necessarily wrong to do decoding. Actually, if the locale is not > consistent with the encoding expected for file names, it would be even > better to first decode command line arguments to the perl internal > unicode encoding, then encode to the encoding that should work for > operations using filenames. > > That is the solution that I would favor.
If you want the Texinfo sources to be in UTF-8 internally, it might be impossible not to decode the command-line arguments into UTF-8. Only if the command-line argument is used to access file names, and doesn't seep into the rest of the output, you can use the original byte sequence. And even then it might be problematic: e.g., what if the argument of -I is in some non-UTF-8 encoding, and the source uses @include with a non-ASCII file name encoded according top @documentencoding, which is UTF-8? You need to construct a complete file name from 2 parts that are encoded differently.
