On Fri, Jul 22, 2022 at 03:07:59PM +0100, Gavin Smith wrote: > I think it's better if variables don't have to be set in combination. > I feel that we could design a better interface. Here's my attempt... > > Options to allow: > * Use document encoding > * Use locale encoding > * Specify encoding explicitly
... > I'm going to make a start by stripping out the LOCALE_ prefix and then > have a look to see if something else is needed to give these variables > priority (from the user's perspective). > One problem with the current implementation as I see it, is that DOC_ENCODING_FOR_INPUT_FILE_NAME controls the effect of INPUT_FILE_NAME_ENCODING (formerly LOCALE_INPUT_FILE_NAME_ENCODING). If DOC_ENCODING_FOR_INPUT_FILE_NAME is set to 1 then INPUT_FILE_NAME_ENCODING has no effect, even if it has been set explicitly by the user on the command line. This is also confusing in texi2any.pl, where (LOCALE_)INPUT_FILE_NAME_ENCODING is defined but is ineffectual in the default case. As I understand it, the configuration is "finished" by the time the parser starts, so it is not possible to set INPUT_FILE_NAME_ENCODING or other config variables from the document. Whatever the configuration is needs to be in place before the document starts being parsed. My current idea is to save the locale encoding, perhaps in a hidden or undocumented customization variable. In Texinfo::Convert::Converter::encoded_input_file_name and similar functions, the value of INPUT_FILE_NAME_ENCODING should always take priority over both the locale encoding, and the document encoding. If INPUT_FILE_NAME_ENCODING is not given, then either the locale or document encoding should be used according to the value of DOC_ENCODING_FOR_INPUT_FILE_NAME. I think this would be quite clear. I had thought about automatically setting DOC_ENCODING_FOR_INPUT_FILE_NAME if INPUT_FILE_NAME_ENCODING was given, but this would be unlike anything else in the program's configuration and would lead to inconsistencies depending on where the configuration came from (command line, defaults, init files...), so is probably best avoided. Likewise with altering the priority of DOC_ENCODING_FOR_INPUT_FILE_NAME and INPUT_FILE_NAME_ENCODING depending on how/where they were set.
