On Mon, Apr 13, 2026 at 07:21:50AM +0100, Gavin Smith wrote:
> On Sun, Apr 12, 2026 at 11:19:44PM +0200, Patrice Dumas wrote:
> > Maybe, be we can not say about all the languages in the IANA and we
> > should rely on the best information existing, which is this information.
> > For Occitan, which I know a bit about, the IANA is quite right, with the
> > main variants as would be expected today (lengadoc in the place I live).
> > There are written systems, lexics for computing related vocabulary.  The
> > number of people speaking Occitan in general is decreasing, there are
> > probably not that many persons interested by Occitan for computer
> > related documentation and even less able to translate manuals and
> > strings, but still, deciding that Occitan variants cannot be used as
> > documentlanguage, while we have the information to do so seems wrong to
> > me.
> 
> Could we make @documentlanguage take the argument LL[_CC][@VARIANT] where
> VARIANT is the script, and add another command to specify language variant?
> 
> For example:
> 
> @documentlanguage oc
> @documentlanguagevariant lengadoc
> 
> or
> 
> @documentlanguage sr@latin
> @documentlanguagevariant ekavsk
> 
> That way the translations using .po files can be used as usual and the
> "variant" can be used to mark the language in output formats where this
> is supported.

This seems to me to be much less logical than the other way round.
Indeed, @documentlanguagevariant really belongs to the @documentlanguage
class of information, while the script information is more or less
unrelated, except that it is used together to find the right strings to
translate.

More generally, I think that we should consider that there are three
different languages representations related to Texinfo.

3) The third one is about how we get strings.
   - in texi2any, this is from gettext, therefore we need to use the
     XPG locales notation because it is what gettext understands currently.
   - for TeX we can do whatever we want

2) The second one is about how we represent the languages internally.
I recently did a commit that changes texi2any to:
 - store all the components of the locale language separately (main
    language, region, and in the future, script and variants)
 - use the BCP 47 language locales internally as hash keys or similar
I made this choice because the BCP 47 language locales are uniquely and
exhaustively specified using the BCP 47 specification + the IANA
language-subtag-registry, in contrast with XPG locales.  And secondarily
because in all the output formats BCP 47 is used.

1) The first representation is the representation in the Texinfo language.
To me this is the one that really matters, because this is the one we
want to be right about and avoid having to change in the future, because
we really dislike a lot changes to the Texinfo language.  For this
representation, my preference would be to follow the conceptual model of
BCP 47 because it is well defined with language, region, script and
language variants in contrast with the XPG locales for which the script
and language variants are there but are not well specified.  Also, BCP
47 + the IANA registry is the best source of existing languages, and
although there are languages that are not relevant, no language is
ignored, something that is much more important to me.  Also, the BCP 47
information can map to XPG locales (for texi2any), while the reverse is
less clear.

Regarding how those informations are provided, my preference is still

@documentlanguage lang_REGION_variant1_variant2...
@documentscript script

An empty @documentscript is valid.  For @documentlanguage, a lang needs
to be specified, the remaining is optional.

With script a ISO15924 4 letter code, and variant* found in 
https://www.iana.org/assignments/language-subtag-registry

The presentation could be something else, but I can't imagine being
convinced not to use the BCP 47 model + the IANA registry.

-- 
Pat

Reply via email to