Dear Gavin & al.,

I don't think that a solution that would be fit only for French, German
and Swedish is a good idea.

Coming back to the patch I am proposing, you are arguing that it is not
good because it relies on the installed locales. I think that this is
not a problem of the proposed patch but of the awk based texindex that
is limited to installed locale. I am expecting that if someone wants to
compile a document for their own language, they have the corresponding
locale installed. So the point with this patch --- as far as Emacs or
any other SW compilation is concerned --- is rather that the compilation
should not be broken because of a missing locale breaking the
documentation compilation. This is something that the autogen
configuration should check: look at the defined locales and restrict the
documentation languages to what is installed. Or maybe the configuration
step could by default restrict to manuals not needing any locale, and
have an option to allow other manuals. Or alternatively, the
configuration script should look at the texindex version, and if it is a
version known to be dependant on installed locales, then restrict doc
doc compilation to manuals not needing installed locales.

So, in a nutshell, the root cause is that texindex is dependant on
installed locales for sorting, and I would like to react to your
(Gavin's) comment about rewriting texindex. Basically, I disagree with
you that it would be overcomplex (I am not volunteering however to go
into this :-D ), because if texindex was to be rewritten one would not
rewrite it from scratch but rather build on some existing indexing
program, which means the amount of code to produce would not be that
huge and complex. Biber would be a good candidate because it is a Perl
program and texi2any is also Perl. I don't know how biber is coded, but
I suspect it has something like a processing module and something like a
cli module, and one could just import the processing module into a
program like tex2any or a novel texindex Perl toplevel script in order
to make the job. So using biber would also have the great advantage that
both tex2any and texindex would use the same indexing engine under the
hood. Or maybe the biber cli is general enough so that the novel
texindex could be built on top of the cli (or texi2dvi directly call
biber with suitable parameters).

With using biber, texinfo would just rely on default sorting rules made
for each language, and texinfo maintainers would not need to develop
anything every time a new language support is added, just inherit from
the work done for LaTeX BibLaTeX/biber.

Furthermore, biber doc (biber v2.21, §1 change 1.9) says:

      Biber no longer checks the environment for locales to use for
      sorting.

which basically solves the current issue: biber has its own set of
language specific default sorting rules, no dependance on installed
locales.

So a texinfo/texindex built on top of Biber would

- adapt the texinfo.tex index output format to the biber input
  format. That is to say, instead of outputing to ses-fr.cp this line

  @entry{ses-read-symbol}{4}{@code {ses-read-symbol}}

  it would output something in a .bib-like format as follows:

  @entry{key-18,
    key={ses-read-symbol},
    page={4},
    value={@code {ses-read-symbol}}
  }

  texinfo.tex would also need to output some .bcf control file to list
  all the datasources, encoding, languages, and tell that all .bib entry
  are cited, etc.

- adapt biber processing modules in order to
  - support the texinfo accent macro expansion. Basically this amounts
    to telling biber that the cat 0 character is not \ but @, as the
    macros are basically the same in LaTeX and Texinfo, for instance é
    is {\'e} in LaTeX and @'{e} in Texinfo. Ok, there are quite a few
    differences, those could be handled in the .bib production process,
    for instance Texinfo ç is @,{c} and it could be translated by
    texinfo.tex to LaTex+catcode64=0 {@c c} during the .bib production
    process.

  - support the output format that texinfo uses as entry for indexes, ie
    instead of the \bitem \newblock stuff of .bbl files, uses some
    @initial{...} @entry{...} output.

- make some main texindex Perl toplevel script, that would
  - use biber processing module,
  - pass a configuration suitable for Texinfo to biber processing module,
  - parse CLI arguments to get the filename radix from texi2dvi.


    Vincent.
________________________________
De : Gavin Smith <[email protected]>
Envoyé : mardi 21 avril 2026 08:31
À : Eli Zaretskii <[email protected]>
Cc : [email protected] <[email protected]>; [email protected] 
<[email protected]>; [email protected] <[email protected]>; [email protected] 
<[email protected]>
Objet : Re: texi2dvi not passing locale to texindex

> > What would be much harder would be to make a letter sort as its
> > own independent letter between A and Z, with its own heading in the
> > index: for example, Ñ between N and O.  We could make Ñ sort between
> > N and O by outputting its sort string as NZZZ, but texindex would take
> > an entry with a sort key beginning with NZZZ as part of the "N"
> > section.  (I'm not sure what languages this would be an issue for.)
> > Also multi-level collation (as in the Unicode Collation Algorithm)
> > is right out.
>
> Also this part.  Don't people expect the non-ASCII Latin characters to
> sort in the order of their Unicode codepoints, which would put them
> _after_ all the ASCII characters?

It depends on the language AFAIK.  Often accented characters are sorted
as variants of the unaccented character.  They are treated like upper-case
letters are.  So first the strings are compared ignoring accents and case,
and its only if they differ that accents and case are significant.  This is
achieved in a multi-level sort by constructing sort keys that are composed of
several parts, the first of which represents the characters with accents and
case distinctions removed.

As far as I've been able to gather, this is the approach used in French
and German.  In Swedish, ö is its own completely independent letter put at the
end of the alphabet.

>  Or are you trying to mimic what the
> French locale mandates as the collation order of the letters?

Yes, exactly - "mimic" is a good word as it won't be done perfectly.  It
wouldn't be correct if two index entries differed only by accents over
letters.

Reply via email to