On 9/8/25 4:59 AM, Grisha Levit wrote:
So, in fact, locale-aware collation is disabled only if the range boundary and character being tested are both codepoints in the range U+0001..U+00FF.This doesn't make much sense for codepoints in the range U+0080..U+00FF, so the <= UCHAR_MAX check should be <=0x7f. (Note that invalid byte sequences that do not form valid characters do not hit this code path)
No, it's perfectly ok to have range expressions with endpoints in that range, if uncommon.
Also, I'm not sure it makes much sense that with globasciiranges on, an ASCII-only range like [0-5] still matches characters like U+2074 (as in OP's example).
That's why it should fail -- it's outside the range *and* greater than UCHAR_MAX.
Also, the documentation suggests that C locale-style collation applies to all ranges in globs, though the presence of "ascii" in the name makes the intended effect unclear.
The `ascii' was intended to indicate that the range in the bracket expression behaves like the days when all characters were ascii and everything was done by integer value. Read the gawk manual chapter for more insight there.
We could probably just remove the <= UCHAR_MAX checks (though this would make the option more like "globcranges").
I should have named it `rationalglobranges' but that ship sailed years ago. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRU c...@case.edu http://tiswww.cwru.edu/~chet/
OpenPGP_signature.asc
Description: OpenPGP digital signature