On Mon, Apr 13, 2026 at 09:49:16PM +0100, Gavin Smith wrote: > BCP 47 and other documentation is vast so it will take me some time to > get a grip on the situation. Here are some notes.
One kind of tutorial on BCP 47 is: https://www.w3.org/International/articles/language-tags/ Everything about extlang is not relevant for us, as it refers to obsolete specifications. > It's possible that BCP 47 is designed for other uses, such as recording > the language of entries in a library collection (so "bibliographic use"). I do not know about other usages, but BCP 47 is used in many cases for the language specification, as I already said, in XML, HTML, LaTeX babel, OrgMode and libreoffice. > Hence I'd like to consider if there could be a simpler approach. Using BCP 47 > language tags in the Texinfo language would make the BCP 47 documentation > part of the Texinfo language by reference. Not necessarily. We can use the ideas behind BCP 47 without even referring to BCP 47. I.e., a main language tag, an optional region tag, an optional script tag and optional variants tags. > Maybe if we had a way of providing such "subtag" information, > we wouldn't need to stick to the exact order of BCP 47. Script could be > provided in a separate command. I agree that it would be better. I do not like the way the BCP 47 format string is setup. > Here's one idea: > > @documentlanguage sr > @documentlanguagevariant ekavsk > @documentscript latin Looks good to me, except that there can be several variants, so maybe @documentlanguagevariants lengadoc @documentlanguagevariants aranes_grclass @documentlanguagevariants Having a separate @documentlanguagevariants requires allowing an empty @documentlanguagevariants to reset to no variants. > I don't like the "latn" and "cyrl" abbreviations used in BCP 47 for "latin" > and "cyrillic" (I'm aware these abbreviated names come from incorporation > of another ISO standard) and think we should just stick to "latin" and > "cyrillic" as already used in .po files. Maybe we could accept both the 4 letter codes of ISO 15924 and aliases, with the constraint that aliases should be at least 5 letters long, allowing the ones used in po files for the most common cases (as seen in bcp47.c: { "latin", "Latn" }, { "cyrillic", "Cyrl" }, { "hebrew", "Hebr" }, { "arabic", "Arab" }, { "devanagari", "Deva" }, { "gurmukhi", "Guru" }, { "mongolian", "Mong" } Looking at the list of ISO 15924 codes and names, many names are quite long and there need to be disambiguation: https://en.wikipedia.org/wiki/ISO_15924 The 4 letter codes are practical to have short but readable and unique names, therefore I think that we should accept the 4 letter codes in any case. -- Pat
