On Mon, Apr 13, 2026 at 09:49:16PM +0100, Gavin Smith wrote:
> BCP 47 and other documentation is vast so it will take me some time to
> get a grip on the situation.  Here are some notes.

One kind of tutorial on BCP 47 is:
https://www.w3.org/International/articles/language-tags/

Everything about extlang is not relevant for us, as it refers to
obsolete specifications.

> It's possible that BCP 47 is designed for other uses, such as recording
> the language of entries in a library collection (so "bibliographic use").

I do not know about other usages, but BCP 47 is used in many cases for
the language specification, as I already said, in XML, HTML, LaTeX
babel, OrgMode and libreoffice.

> Hence I'd like to consider if there could be a simpler approach.  Using BCP 47
> language tags in the Texinfo language would make the BCP 47 documentation
> part of the Texinfo language by reference.

Not necessarily.  We can use the ideas behind BCP 47 without even
referring to BCP 47.  I.e., a main language tag, an optional region tag,
an optional script tag and optional variants tags.

> Maybe if we had a way of providing such "subtag" information,
> we wouldn't need to stick to the exact order of BCP 47.  Script could be
> provided in a separate command.

I agree that it would be better.  I do not like the way the BCP 47
format string is setup.

>  Here's one idea:
> 
> @documentlanguage sr
> @documentlanguagevariant ekavsk
> @documentscript latin

Looks good to me, except that there can be several variants, so maybe
@documentlanguagevariants lengadoc
@documentlanguagevariants aranes_grclass
@documentlanguagevariants

Having a separate @documentlanguagevariants requires allowing an empty
@documentlanguagevariants to reset to no variants.
 
> I don't like the "latn" and "cyrl" abbreviations used in BCP 47 for "latin"
> and "cyrillic" (I'm aware these abbreviated names come from incorporation
> of another ISO standard) and think we should just stick to "latin" and
> "cyrillic" as already used in .po files.

Maybe we could accept both the 4 letter codes of ISO 15924 and aliases,
with the constraint that aliases should be at least 5 letters long,
allowing the ones used in po files for the most common cases (as seen
in bcp47.c:
  { "latin",      "Latn" },
  { "cyrillic",   "Cyrl" },
  { "hebrew",     "Hebr" },
  { "arabic",     "Arab" },
  { "devanagari", "Deva" },
  { "gurmukhi",   "Guru" },
  { "mongolian",  "Mong" }


Looking at the list of ISO 15924 codes and names, many names are quite long
and there need to be disambiguation:
https://en.wikipedia.org/wiki/ISO_15924

The 4 letter codes are practical to have short but readable and unique names,
therefore I think that we should accept the 4 letter codes in any case.

-- 
Pat

Reply via email to