On Mon, Feb 21, 2022 at 08:46:56PM +0000, Gavin Smith wrote: > On Sun, Feb 20, 2022 at 10:32:00PM +0100, Patrice Dumas wrote: > > On Sun, Feb 20, 2022 at 05:27:51PM +0000, Gavin Smith wrote: > > > If the error message became something like > > > > > > "nœud « �sseul� » non référencé" > > > > > > then encoding this to UTF-8 would break the parts which already were in > > > UTF-8. > > > > I just commited input decoding (command line, environment, translated > > messages) and output messages encoding. I left file names as is, but > > prepared a customization variable for them. > > > > Now the error message is: > > > > testé.texi:8: warning: nœud « ésseulé » non référencé > > One way of fixing this would be to store the filename separately along with > the rest of the error message, and prepend the filename when it is output. > I can try to implement this.
This does not seems to be easy, but probably doable. It removes the need to encode before using file related functions perl wants bytes for, but requires to find all the occurences in code where there could be some concatenation with strings coming from other command line data, from customization files and variables or from the Texinfo document. There are also probably other file name parts that would need to be encoded as bytes, or it should be made sure that there are already bytes. For example @image related file names. I think that your commit e11835b62d8f3d43c608013d21683c72e9a54cc3 "@include file name encoding" would still need to be modified in order to use a specific encoding to encode the file name to and not simply use utf8::encode as the file names encoding may not be utf8. Using the locale encoding as the default seems better to me, with a possibility to modify the value on the command line, and FILE_NAMES_ENCODING_NAME could be used for that. To be checked, but it seems to me that in the XS parser this information should also be used where the include file name string (and maybe other file names) should be converted to that encoding from utf-8 if that encoding is not different from utf-8. Also we need to do something specific in case this encoding used for file names bytes is not the same as the MESSAGE_OUTPUT_ENCODING_NAME, either convert with Encode::from_to or maybe just warn. -- Pat
