Re: Non-ASCII characters in @include search path

Patrice Dumas Sat, 26 Feb 2022 13:02:06 -0800

On Sat, Feb 26, 2022 at 08:06:52PM +0000, Gavin Smith wrote:
> 
> This actually seems impossible to completely fix with the current approach:
> since error messages are character strings (a recent change, but required for
> correct interpolation of non-filename strings), if there is some file 
> operation
> error with a non-UTF-8 filename, it will be impossible to interpolate that
> filename into the error message.  (This is possibly not a major issue as this
> error is not output when other fixes are made - see later in this email.)
> 
> I "fixed" this by calling utf8::decode on the interpolated filename;
> of course, this will be wrong if the filename is not in UTF-8, but there
> is no alternative.


This seems wrong, I think that to do that correctly, you need to keep
the information on the encoding used to encode the file name.

> Debugging code:
> 
> $image_file came from Texinfo::HTML::html_image_file_location_name.
> 
> Debugging code in HTML.pm:
> 
> diff --git a/tp/Texinfo/Convert/HTML.pm b/tp/Texinfo/Convert/HTML.pm
> index 374b41c4d8..4ce595f61c 100644
> --- a/tp/Texinfo/Convert/HTML.pm
> +++ b/tp/Texinfo/Convert/HTML.pm
> @@ -282,6 +282,9 @@ sub html_image_file_location_name($$$$)
>          # will be moved by the caller anyway.
>          # If the file path found was to be used it should be decoded to perl
>          # codepoints too.
> +        warn "IMAGE ".
> +        utf8::is_utf8($image_basefile)
> +        .":".utf8::is_utf8($extension)."\n";
>          $image_file = $image_basefile.$extension;
>          $image_extension = $extension;
>          last;
> 
> output:
> 
> IMAGE 1:
> 
> 
> $image_basefile has the UTF-8 flag on (and $extension doesn't). However,
> encoded_file_name was already called, so the output from it could be used
> instead:

I do not understand why $extension does not have the UTF-8 flag on.  I
guess that it is because it is ascii strings in general.  But I also
guess that it also could be wrong, if, for instance it is set on the
command line, and decoded.

> diff --git a/tp/Texinfo/Convert/HTML.pm b/tp/Texinfo/Convert/HTML.pm
> index 374b41c4d8..2ef6df8c54 100644
> --- a/tp/Texinfo/Convert/HTML.pm
> +++ b/tp/Texinfo/Convert/HTML.pm
> @@ -282,7 +282,7 @@ sub html_image_file_location_name($$$$)
>          # will be moved by the caller anyway.
>          # If the file path found was to be used it should be decoded to perl
>          # codepoints too.
> -        $image_file = $image_basefile.$extension;
> +        $image_file = $file_name;
>          $image_extension = $extension;
>          last;
>        }
> 
> This eliminates the error message about 'could not copy':
> 
> $ cat formatting/out_parser/non_ascii_test_epub/osé.2
> osé.texi:15: warning: undefined flag: vùr
> osé.texi:23: @include: could not find not_existïng.téxi
> osé.texi:21: warning: @image file `dîrectory/imàge' (for HTML) not found, 
> using `dîrectory/imàge.êxt'
> texi2any: @image file `dîrectory/imàge' can not be copied
> osé.texi:27: @verbatiminclude: could not find vi_not_existïng.téxi
> 
> 
> Does that fix the issue with this test?

Yes, but this is most probably wrong, as $image_file often ends up in
output files.  Unless I am missing something it should be the case in
formatting/out_parser/non_ascii_test_epub/osé_epub_package/EPUB/xhtml/osé.xhtml
the image file name should not be correctly encoded.

> Here are the rest of the files in the output directory:
> 
> $ find formatting/out_parser/non_ascii_test_epub/
> formatting/out_parser/non_ascii_test_epub/
> formatting/out_parser/non_ascii_test_epub/osé.1
> formatting/out_parser/non_ascii_test_epub/osé.2
> formatting/out_parser/non_ascii_test_epub/osé_epub_package
> formatting/out_parser/non_ascii_test_epub/osé_epub_package/mimetype
> formatting/out_parser/non_ascii_test_epub/osé_epub_package/EPUB
> formatting/out_parser/non_ascii_test_epub/osé_epub_package/EPUB/osé.opf
> formatting/out_parser/non_ascii_test_epub/osé_epub_package/EPUB/xhtml
> formatting/out_parser/non_ascii_test_epub/osé_epub_package/EPUB/xhtml/nav_toc.xhtml
> formatting/out_parser/non_ascii_test_epub/osé_epub_package/EPUB/xhtml/osé.xhtml
> formatting/out_parser/non_ascii_test_epub/osé_epub_package/EPUB/images
> formatting/out_parser/non_ascii_test_epub/osé_epub_package/EPUB/images/an_ïmage.png
> formatting/out_parser/non_ascii_test_epub/osé_epub_package/META-INF
> formatting/out_parser/non_ascii_test_epub/osé_epub_package/META-INF/container.xml
> 
> Is this correct?

Yes, it is correct.

-- 
Pat

Re: Non-ASCII characters in @include search path

Reply via email to