On Sun, Sep 7, 2025 at 2:46 AM Duncan Roe wrote:
> `ls -1 [0-5]*` should produce the same output as `ls -1` but instead:-
[...]
> superscripts ¹, ² & ³ are missing.
>
> My take at an explanation: '₀' - '₉' are Unicode U+2080-9. These display fine.
> '⁰' is U+2070 & '⁹' is U+2079, but '¹' is U+00B9, '²' is U+00B2 & '³' is 
> U+00B3.

This appears to be a bug with the globasciiranges option.

The documentation suggests that enabling this option will disable locale-
aware collation in range expressions:

      globasciiranges
          If set, range expressions used in pattern matching  bracket
          expressions  (see  Pattern  Matching above) behave as if in
          the traditional C locale when performing comparisons.  That
          is, pattern matching does not  take  the  current  locale’s
          collating sequence  into  account,  so  b  will not collate
          between  A  and  B,  and  upper‐case  and  lower‐case ASCII
          characters will collate together.

But the implementing code [1] for multibyte locales does the following:

   385  charcmp_wc (wint_t c1, wint_t c2, int forcecoll)
   ...
   393    if (forcecoll == 0 && glob_asciirange && c1 <= UCHAR_MAX && c2 <= 
UCHAR_MAX)
   394      return ((int)(c1 - c2));
   ...
   399    return (wcscoll (s1, s2));

So, in fact, locale-aware collation is disabled only if the range start
and end codepoints are both in the range U+0001..U+00FF.  This doesn't
make much sense for codepoints in the range U+0080..U+00FF.

We should either:

  * Remove the <= UCHAR_MAX checks (which would make the behavior match
    the documentation)
  * Replace the <= UCHAR_MAX checks with <= 0x7f checks (and update the
    documentation to note that C locale-style comparisons are done only
    if both ends of the range are ASCII characters)

[1] 
https://cgit.git.savannah.gnu.org/cgit/bash.git/tree/lib/glob/smatch.c?h=bash-5.3#n385

Reply via email to