On Sat, Feb 26, 2022 at 09:29:15PM +0000, Gavin Smith wrote: > On Sat, Feb 26, 2022 at 9:11 PM Patrice Dumas <[email protected]> wrote: > > The whole output file is encoded, the problem is that you encoded > > $image_file, it should not be, it is assumed to be decoded from the > > document. image_path could be encoded, but then the encoding should be > > passed such that it can be re-decoded, for error messages, for instance. > > It would probably be easier to do it the way you said and decode all > the file names and encode them just before use. It's too confusing > otherwise, even if doing it that way would give a little more > flexibility for non-UTF-8 input files and locales (assuming we > actually did it properly, and didn't ever break it by mistake). > > I looked at HTML.pm and found it hard to understand where variables or > functions had the word "filename" in them, what exactly this referred > to, if it was supposed to be the encoded or unencoded filename > (encoded for creating and finding files, unencoded for linking to > them). I imagine this would be confusing on an ongoing basis if it > meant both in different places.
To me the easiest way to avoid confusion is to have everything as character strings and only encode when a file name is needed for an operation on the file system (stat, with -e, open, readdir...). That way it is ok in any case. As a side note, normally, a marker of working on file paths is the use of File::Spec. But I am not sure if it is really done systematically and correctly. > I expect non-ASCII, non-UTF-8 filenames would be fairly rare, but if > there is some use case where they don't work as intended in whatever > we implement, there could be customization variables to control > encoding and decoding of filenames to support these cases. This should be easily done by using Texinfo::Common::encode_file_name systematically when encoding strings as file names, and using in this function customization variables. -- Pat
