On Sun, Feb 20, 2022 at 03:35:57PM +0200, Eli Zaretskii wrote: > > Date: Sun, 20 Feb 2022 14:28:23 +0100 > > From: Patrice Dumas <[email protected]> > > > > On Sun, Feb 20, 2022 at 01:09:06PM +0000, Gavin Smith wrote: > > > > > > My thought was that the argument to -I could have been any sequence of > > > bytes, > > > not necessarily correct UTF-8. It would be wrong then to attempt any > > > encoding or decoding to a string formed from such an argument. > > > > Indeed, that must be what is happening here. I think that it is not > > necessarily wrong to do decoding. Actually, if the locale is not > > consistent with the encoding expected for file names, it would be even > > better to first decode command line arguments to the perl internal > > unicode encoding, then encode to the encoding that should work for > > operations using filenames. > > > > That is the solution that I would favor. > > If you want the Texinfo sources to be in UTF-8 internally, it might be > impossible not to decode the command-line arguments into UTF-8. Only > if the command-line argument is used to access file names, and doesn't > seep into the rest of the output, you can use the original byte > sequence. And even then it might be problematic: e.g., what if the > argument of -I is in some non-UTF-8 encoding, and the source uses > @include with a non-ASCII file name encoded according top > @documentencoding, which is UTF-8? You need to construct a complete > file name from 2 parts that are encoded differently.
I agree with you, and that is what I was proposing to do, actually. I propose to decode command line arguments in the perl unicode internal encoding. And encode file names to the file system encoding as late as possible. -- Pat
