There was a recent commit related to index sorting so this is not too
surprising.

This refers to the automatic build at 
https://github.com/gnu-texinfo/ci-check/actions.

With a Github login, I was able to download the log files.

I see it fails on the "build-tarball" step.  I presume that the tarball is built
first before being extracted and built on different systems for the package to
be tested on.  The log file "16_build-tarball.txt" shows the system whereon
the tarball is built to be as follows:

2026-05-25T11:58:25.4657644Z Image: ubuntu-22.04                                
2026-05-25T11:58:25.4658405Z Version: 20260518.146.1                            
2026-05-25T11:58:25.4659632Z Included Software: 
https://github.com/actions/runner-images/blob/ubuntu22/20260518.146/images/ubuntu/Ubuntu2204-Readme.md
2026-05-25T11:58:25.4661244Z Image Release: 
https://github.com/actions/runner-images/releases/tag/ubuntu22%2F20260518.146

I'm not familiar with all the C code around the bit I changed related to index
sorting so it's hard for me to know where to look to see where the problem
is.

I tried building on an Ubuntu 24.04 system, as that was the most similar system
I had easy access to (on cfarm).

I could get very intermittent failures in encoding_index_latin1_enable_encoding
or encoding_index_ascii.

$ diff t/results/indices/encoding_index_latin1_enable_encoding.pl* -c
*** t/results/indices/encoding_index_latin1_enable_encoding.pl  2026-05-06 
11:11:40.000000000 -0500
--- t/results/indices/encoding_index_latin1_enable_encoding.pl.new      
2026-05-25 11:23:38.362723275 -0500
***************
*** 706,713 ****
   j
   k
   l
-  ł
   Ł
   m
   n
   o
--- 706,713 ----
   j
   k
   l
   Ł
+  ł
   m
   n
   o

(In fact, I remember having the same failure on my own system on such a test,
which I'd never repeated.  After running "perl -w t/09indices.t" about 10 times,
I finally got it to happen again for encoding_index_ascii_enable_encoding,
with the same misplacement of "ł". I have Perl 5.38.2 installed, which is the
same as that used for my tests on Ubuntu 24.04.)

When looking at the code around the changed code in get_sort_key,
I noticed something:

                  uc_sort_string = to_upper_or_lower_multibyte
                                         (subenty_sort_string->sort_string, 1);
                  sortable_subentry->sort_key = get_sort_key (collator,
                                                            uc_sort_string);

'to_upper_or_lower_multibyte' calls u8_toupper on the index entry text, which
presumably converts it to upper case.

Likewise, the Perl code in Texinfo/Indices.pm uses 'uc' to uppercase the
string before getting the sort key:

        my $sort_key = $collator->getSortKey(uc($sort_string));

That would explain why Ł and ł were sometimes in the wrong order, as the
entries would be indistinguishable if uppercased.  However, why this problem
is triggered now, and only with Polish ł, is still a mystery.  The 'uc'
has been there in the Perl code for at least two years (I didn't trace
back the git history any more).

Uppercasing the string before getting the sort key should NOT be necessary
for most of the sorting options.  (It might be useful when not getting
a sort key at all, when USE_UNICODE_COLLATION is 0.)

The only idea I have as to why the string is uppercased is so that the
first letter of the index entry will be uppercase, for sorting into
sections by first letter.

I'm still working on trying to narrow down the error and see exactly
what happens differently when the string is sorted differently.


On Mon, May 25, 2026 at 02:37:22PM +0200, Bruno Haible via Bug reports for the 
GNU Texinfo documentation system wrote:
> Hi,
> 
> Today's CI failed:
> 
> FAIL: t/09indices.t 1472 - encoding_index_latin1_enable_encoding indices sort
> FAIL: t/09indices.t 1477 - encoding_index_latin1_enable_encoding converted 
> file_plaintext
> FAIL: t/09indices.t 1479 - encoding_index_latin1_enable_encoding converted 
> file_info
> 
> ERROR: t/09indices.t - exited with status 3
> 
> 
> 
> 

  • CI: ... Bruno Haible via Bug reports for the GNU Texinfo documentation system
    • ... Patrice Dumas
      • ... Gavin Smith
        • ... Patrice Dumas
    • ... Gavin Smith
      • ... Gavin Smith
        • ... Patrice Dumas
          • ... Gavin Smith
            • ... Patrice Dumas
              • ... Gavin Smith
        • ... Patrice Dumas

Reply via email to