On Sat, Feb 26, 2022 at 08:06:52PM +0000, Gavin Smith wrote: > > This actually seems impossible to completely fix with the current approach: > since error messages are character strings (a recent change, but required for > correct interpolation of non-filename strings), if there is some file > operation > error with a non-UTF-8 filename, it will be impossible to interpolate that > filename into the error message. (This is possibly not a major issue as this > error is not output when other fixes are made - see later in this email.) > > I "fixed" this by calling utf8::decode on the interpolated filename; > of course, this will be wrong if the filename is not in UTF-8, but there > is no alternative.
This seems wrong, I think that to do that correctly, you need to keep the information on the encoding used to encode the file name. > Debugging code: > > $image_file came from Texinfo::HTML::html_image_file_location_name. > > Debugging code in HTML.pm: > > diff --git a/tp/Texinfo/Convert/HTML.pm b/tp/Texinfo/Convert/HTML.pm > index 374b41c4d8..4ce595f61c 100644 > --- a/tp/Texinfo/Convert/HTML.pm > +++ b/tp/Texinfo/Convert/HTML.pm > @@ -282,6 +282,9 @@ sub html_image_file_location_name($$$$) > # will be moved by the caller anyway. > # If the file path found was to be used it should be decoded to perl > # codepoints too. > + warn "IMAGE ". > + utf8::is_utf8($image_basefile) > + .":".utf8::is_utf8($extension)."\n"; > $image_file = $image_basefile.$extension; > $image_extension = $extension; > last; > > output: > > IMAGE 1: > > > $image_basefile has the UTF-8 flag on (and $extension doesn't). However, > encoded_file_name was already called, so the output from it could be used > instead: I do not understand why $extension does not have the UTF-8 flag on. I guess that it is because it is ascii strings in general. But I also guess that it also could be wrong, if, for instance it is set on the command line, and decoded. > diff --git a/tp/Texinfo/Convert/HTML.pm b/tp/Texinfo/Convert/HTML.pm > index 374b41c4d8..2ef6df8c54 100644 > --- a/tp/Texinfo/Convert/HTML.pm > +++ b/tp/Texinfo/Convert/HTML.pm > @@ -282,7 +282,7 @@ sub html_image_file_location_name($$$$) > # will be moved by the caller anyway. > # If the file path found was to be used it should be decoded to perl > # codepoints too. > - $image_file = $image_basefile.$extension; > + $image_file = $file_name; > $image_extension = $extension; > last; > } > > This eliminates the error message about 'could not copy': > > $ cat formatting/out_parser/non_ascii_test_epub/osé.2 > osé.texi:15: warning: undefined flag: vùr > osé.texi:23: @include: could not find not_existïng.téxi > osé.texi:21: warning: @image file `dîrectory/imàge' (for HTML) not found, > using `dîrectory/imàge.êxt' > texi2any: @image file `dîrectory/imàge' can not be copied > osé.texi:27: @verbatiminclude: could not find vi_not_existïng.téxi > > > Does that fix the issue with this test? Yes, but this is most probably wrong, as $image_file often ends up in output files. Unless I am missing something it should be the case in formatting/out_parser/non_ascii_test_epub/osé_epub_package/EPUB/xhtml/osé.xhtml the image file name should not be correctly encoded. > Here are the rest of the files in the output directory: > > $ find formatting/out_parser/non_ascii_test_epub/ > formatting/out_parser/non_ascii_test_epub/ > formatting/out_parser/non_ascii_test_epub/osé.1 > formatting/out_parser/non_ascii_test_epub/osé.2 > formatting/out_parser/non_ascii_test_epub/osé_epub_package > formatting/out_parser/non_ascii_test_epub/osé_epub_package/mimetype > formatting/out_parser/non_ascii_test_epub/osé_epub_package/EPUB > formatting/out_parser/non_ascii_test_epub/osé_epub_package/EPUB/osé.opf > formatting/out_parser/non_ascii_test_epub/osé_epub_package/EPUB/xhtml > formatting/out_parser/non_ascii_test_epub/osé_epub_package/EPUB/xhtml/nav_toc.xhtml > formatting/out_parser/non_ascii_test_epub/osé_epub_package/EPUB/xhtml/osé.xhtml > formatting/out_parser/non_ascii_test_epub/osé_epub_package/EPUB/images > formatting/out_parser/non_ascii_test_epub/osé_epub_package/EPUB/images/an_ïmage.png > formatting/out_parser/non_ascii_test_epub/osé_epub_package/META-INF > formatting/out_parser/non_ascii_test_epub/osé_epub_package/META-INF/container.xml > > Is this correct? Yes, it is correct. -- Pat
