On 9/8/25 4:59 AM, Grisha Levit wrote:

So, in fact, locale-aware collation is disabled only if the range boundary
and character being tested are both codepoints in the range U+0001..U+00FF.

This doesn't make much sense for codepoints in the range U+0080..U+00FF, so
the <= UCHAR_MAX check should be <=0x7f. (Note that invalid byte sequences
that do not form valid characters do not hit this code path)

No, it's perfectly ok to have range expressions with endpoints in that
range, if uncommon.

Also, I'm not sure it makes much sense that with globasciiranges on, an
ASCII-only range like [0-5] still matches characters like U+2074 (as in
OP's example).

That's why it should fail -- it's outside the range *and* greater than
UCHAR_MAX.
Also, the documentation suggests that C locale-style collation applies
to all ranges in globs, though the presence of "ascii" in the name makes
the intended effect unclear.
The `ascii' was intended to indicate that the range in the bracket
expression behaves like the days when all characters were ascii and
everything was done by integer value. Read the gawk manual chapter for
more insight there.
We could probably just remove the <= UCHAR_MAX checks (though this would
make the option more like "globcranges").

I should have named it `rationalglobranges' but that ship sailed years ago.

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
                 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    c...@case.edu    http://tiswww.cwru.edu/~chet/

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

Reply via email to