Re: Non-ASCII characters in @include search path

Patrice Dumas Sun, 20 Feb 2022 05:45:47 -0800

On Sun, Feb 20, 2022 at 03:35:57PM +0200, Eli Zaretskii wrote:
> > Date: Sun, 20 Feb 2022 14:28:23 +0100
> > From: Patrice Dumas <[email protected]>
> > 
> > On Sun, Feb 20, 2022 at 01:09:06PM +0000, Gavin Smith wrote:
> > > 
> > > My thought was that the argument to -I could have been any sequence of 
> > > bytes,
> > > not necessarily correct UTF-8.  It would be wrong then to attempt any
> > > encoding or decoding to a string formed from such an argument.
> > 
> > Indeed, that must be what is happening here.  I think that it is not
> > necessarily wrong to do decoding.  Actually, if the locale is not
> > consistent with the encoding expected for file names, it would be even
> > better to first decode command line arguments to the perl internal
> > unicode encoding, then encode to the encoding that should work for
> > operations using filenames.
> > 
> > That is the solution that I would favor.
> 
> If you want the Texinfo sources to be in UTF-8 internally, it might be
> impossible not to decode the command-line arguments into UTF-8.  Only
> if the command-line argument is used to access file names, and doesn't
> seep into the rest of the output, you can use the original byte
> sequence.  And even then it might be problematic: e.g., what if the
> argument of -I is in some non-UTF-8 encoding, and the source uses
> @include with a non-ASCII file name encoded according top
> @documentencoding, which is UTF-8?  You need to construct a complete
> file name from 2 parts that are encoded differently.


I agree with you, and that is what I was proposing to do, actually.  I
propose to decode command line arguments in the perl unicode internal
encoding.  And encode file names to the file system encoding as late as
possible.

-- 
Pat

Re: Non-ASCII characters in @include search path

Reply via email to