Re: CI: 3 FAIL, 1 ERROR

Gavin Smith Mon, 25 May 2026 13:11:50 -0700

On Mon, May 25, 2026 at 09:56:35PM +0200, Patrice Dumas wrote:
> On Mon, May 25, 2026 at 07:27:09PM +0100, Gavin Smith wrote:
> > I still don't know what is supposed to make two strings sort in a 
> > predictable
> > order if they differ only by case.  I checked that the sort keys are 
> > identical
> > in that case.  Patrice, do you remember anything?
> 
> The sorting does not only use the sort key, but also the number of the
> index entry in index and the index names sort order (needed for merged
> indices), as seen in Perl in Indices.pm _sort_index_entries.  Therefore,
> the order should be predictable even if the upper and lower case letters
> have the same sort key.


I understand now why the index entries are output in a predicatable order.
However, this means that changing the order of index entries changes
the order in the index, if entries differ only by letter case.  For
example:

$ cat test.texi ; TEXINFO_OUTPUT_FORMAT=plaintext texi2any test.texi
\input texinfo

@node Top
@top

@cindex aa
@cindex zz
@cindex @L{}
@cindex @l{}

@printindex cp

@bye

* Menu:

* aa:                                    Top.                   (line 0)
* Ł:                                     Top.                   (line 0)
* ł:                                     Top.                   (line 0)
* zz:                                    Top.                   (line 0)

$ cat test.texi ; TEXINFO_OUTPUT_FORMAT=plaintext texi2any test.texi
\input texinfo

@node Top
@top

@cindex aa
@cindex zz
@cindex @l{}
@cindex @L{}

@printindex cp

@bye

* Menu:

* aa:                                    Top.                   (line 0)
* ł:                                     Top.                   (line 0)
* Ł:                                     Top.                   (line 0)
* zz:                                    Top.                   (line 0)



There is also

> 
> That being said, I do not know exactly why the strings are upper-cased
> before being sorted.  Maybe this is relevant if there is no
> Unicode::Collate sorting (presumably, the lowercase/uppercase sorting is
> done well with Unicode::Collate), as it allows the upper-case and lower
> case letter to be nearby in sort in that case.

Yes, exactly, although it wouldn't make upper case and lower case variants
sort in a consistent order.  There may be ways to make that happen using
strcmp comparison: something like:

sort key = uppercase(index entry) . '\x01' . index entry

- i.e., concatenate the uppercased index entry with the original index
entry, with a low valued byte in between.  But it is not that important.

Re: CI: 3 FAIL, 1 ERROR

Reply via email to