On Sat, Feb 26, 2022 at 9:11 PM Patrice Dumas <[email protected]> wrote: > The whole output file is encoded, the problem is that you encoded > $image_file, it should not be, it is assumed to be decoded from the > document. image_path could be encoded, but then the encoding should be > passed such that it can be re-decoded, for error messages, for instance.
It would probably be easier to do it the way you said and decode all the file names and encode them just before use. It's too confusing otherwise, even if doing it that way would give a little more flexibility for non-UTF-8 input files and locales (assuming we actually did it properly, and didn't ever break it by mistake). I looked at HTML.pm and found it hard to understand where variables or functions had the word "filename" in them, what exactly this referred to, if it was supposed to be the encoded or unencoded filename (encoded for creating and finding files, unencoded for linking to them). I imagine this would be confusing on an ongoing basis if it meant both in different places. I expect non-ASCII, non-UTF-8 filenames would be fairly rare, but if there is some use case where they don't work as intended in whatever we implement, there could be customization variables to control encoding and decoding of filenames to support these cases.
