On Sun, Sep 7, 2025 at 2:46 AM Duncan Roe wrote: > `ls -1 [0-5]*` should produce the same output as `ls -1` but instead:- [...] > superscripts ¹, ² & ³ are missing. > > My take at an explanation: '₀' - '₉' are Unicode U+2080-9. These display fine. > '⁰' is U+2070 & '⁹' is U+2079, but '¹' is U+00B9, '²' is U+00B2 & '³' is > U+00B3.
This appears to be a bug with the globasciiranges option. The documentation suggests that enabling this option will disable locale- aware collation in range expressions: globasciiranges If set, range expressions used in pattern matching bracket expressions (see Pattern Matching above) behave as if in the traditional C locale when performing comparisons. That is, pattern matching does not take the current locale’s collating sequence into account, so b will not collate between A and B, and upper‐case and lower‐case ASCII characters will collate together. But the implementing code [1] for multibyte locales does the following: 385 charcmp_wc (wint_t c1, wint_t c2, int forcecoll) ... 393 if (forcecoll == 0 && glob_asciirange && c1 <= UCHAR_MAX && c2 <= UCHAR_MAX) 394 return ((int)(c1 - c2)); ... 399 return (wcscoll (s1, s2)); So, in fact, locale-aware collation is disabled only if the range start and end codepoints are both in the range U+0001..U+00FF. This doesn't make much sense for codepoints in the range U+0080..U+00FF. We should either: * Remove the <= UCHAR_MAX checks (which would make the behavior match the documentation) * Replace the <= UCHAR_MAX checks with <= 0x7f checks (and update the documentation to note that C locale-style comparisons are done only if both ends of the range are ASCII characters) [1] https://cgit.git.savannah.gnu.org/cgit/bash.git/tree/lib/glob/smatch.c?h=bash-5.3#n385