Hi Grisha, On Mon, Sep 08, 2025 at 02:24:50AM -0400, Grisha Levit wrote: > On Sun, Sep 7, 2025 at 2:46 AM Duncan Roe wrote: > > `ls -1 [0-5]*` should produce the same output as `ls -1` but instead:- > [...] > > superscripts ¹, ² & ³ are missing. > > > > My take at an explanation: '₀' - '₉' are Unicode U+2080-9. These display > > fine. > > '⁰' is U+2070 & '⁹' is U+2079, but '¹' is U+00B9, '²' is U+00B2 & '³' is > > U+00B3. > > This appears to be a bug with the globasciiranges option. > > The documentation suggests that enabling this option will disable locale- > aware collation in range expressions: > > globasciiranges > If set, range expressions used in pattern matching bracket > expressions (see Pattern Matching above) behave as if in > the traditional C locale when performing comparisons. That > is, pattern matching does not take the current locale’s > collating sequence into account, so b will not collate > between A and B, and upper‐case and lower‐case ASCII > characters will collate together. > > But the implementing code [1] for multibyte locales does the following: > > 385 charcmp_wc (wint_t c1, wint_t c2, int forcecoll) > ... > 393 if (forcecoll == 0 && glob_asciirange && c1 <= UCHAR_MAX && c2 <= > UCHAR_MAX) > 394 return ((int)(c1 - c2)); > ... > 399 return (wcscoll (s1, s2)); > > So, in fact, locale-aware collation is disabled only if the range start > and end codepoints are both in the range U+0001..U+00FF. This doesn't > make much sense for codepoints in the range U+0080..U+00FF. > > We should either: > > * Remove the <= UCHAR_MAX checks (which would make the behavior match > the documentation) > * Replace the <= UCHAR_MAX checks with <= 0x7f checks (and update the > documentation to note that C locale-style comparisons are done only > if both ends of the range are ASCII characters) > > [1] > https://cgit.git.savannah.gnu.org/cgit/bash.git/tree/lib/glob/smatch.c?h=bash-5.3#n385 >
As I just responded to Oğuz, `ls -1 [i-j]*` shows ⁱ.txt with globasciiranges on, and 'i' and 'j' are certainly in the range 0001..U+00FF. Cheers ... Duncan.