Ok, this is reasonable.  Mea culpa.  I was ignorant.  But purely waxing 
philosophical (which I'm entitled to do at my age), escapes should be necessary 
in order to make intention explicit so that errors can be caught.

What I left out was how I stumbled onto this.  I decided to test Microsoft 
copilot by asking it to provide a regex for the format of an IP address in CIDR 
notation.  It provided this:

^((25[0-5]|2[0-4][0-9]|1?[0-9]{1,2})\.){3}(25[0-5]|2[0-4][0-9]|1?[0-9]{1,2])\/(3[0-2]|[12]?[0-9])$

Note the bolded right-bracket, which is in error.  What I find interesting is 
that it got right what I considered the more challenging task of expressing 
that each octet can only be an 8-bit value, but couldn't follow a simple 
grammatical rule following the regex syntax.
________________________________
From: Paul Eggert <[email protected]>
Sent: Wednesday, October 1, 2025 4:04 PM
To: Seth David Schoen <[email protected]>
Cc: [email protected] <[email protected]>; Bob Peraino <[email protected]>
Subject: Re: bug#79550: Grammar bug in grep

External Message: Use Caution


On 2025-10-01 12:16, Seth David Schoen wrote:
> $ echo 'hello{' | egrep '{'
> hello{
> $ echo 'hello[' | egrep '['
> grep: Invalid regular expression

You're right that it's inconsistent. However, it's what AT&T/Sun egrep
does. I just now confirmed this with Solaris 10 /usr/bin/egrep:

$ echo 'hello{' | egrep '{'
hello{
$ echo 'hello[' | egrep '['
egrep: syntax error

7th Edition Unix egrep did not treat '{' as a metacharacter, and I
suspect that when AT&T (or Sun?) added support for '{...}' they did not
want to break existing scripts that used '{' as an ordinary character.

When GNU grep was written, its developers didn't want to break existing
scripts that assumed AT&T/Sun behavior, so they copied this
inconsistency. And changing GNU grep's behavior now might break things.

At least GNU grep's behavior is documented for these corner cases. You
probably won't be so lucky with non-GNU grep.

Reply via email to