On Mon, May 25, 2026 at 06:07:14PM +0100, Gavin Smith wrote: > 'to_upper_or_lower_multibyte' calls u8_toupper on the index entry text, which > presumably converts it to upper case. > > Likewise, the Perl code in Texinfo/Indices.pm uses 'uc' to uppercase the > string before getting the sort key: > > my $sort_key = $collator->getSortKey(uc($sort_string)); > > That would explain why Ł and ł were sometimes in the wrong order, as the > entries would be indistinguishable if uppercased. However, why this problem > is triggered now, and only with Polish ł, is still a mystery. The 'uc' > has been there in the Perl code for at least two years (I didn't trace > back the git history any more). > > Uppercasing the string before getting the sort key should NOT be necessary > for most of the sorting options. (It might be useful when not getting > a sort key at all, when USE_UNICODE_COLLATION is 0.)
I found a bug with the returned length of the sort key, which is very likely responsible. The length could be too long, leading uninitialised bytes after a terminating null to be included in the sort key. This would explain why the test failure was hard to reproduce as it depended on the contents of uninitialised memory. I fixed this in texi2any in commit 54592bff0a (today). I still don't know what is supposed to make two strings sort in a predictable order if they differ only by case. I checked that the sort keys are identical in that case. Patrice, do you remember anything?
