Ok, this is reasonable. Mea culpa. I was ignorant. But purely waxing
philosophical (which I'm entitled to do at my age), escapes should be necessary
in order to make intention explicit so that errors can be caught.
What I left out was how I stumbled onto this. I decided to test Microsoft
copilot by asking it to provide a regex for the format of an IP address in CIDR
notation. It provided this:
^((25[0-5]|2[0-4][0-9]|1?[0-9]{1,2})\.){3}(25[0-5]|2[0-4][0-9]|1?[0-9]{1,2])\/(3[0-2]|[12]?[0-9])$
Note the bolded right-bracket, which is in error. What I find interesting is
that it got right what I considered the more challenging task of expressing
that each octet can only be an 8-bit value, but couldn't follow a simple
grammatical rule following the regex syntax.
________________________________
From: Paul Eggert <[email protected]>
Sent: Wednesday, October 1, 2025 4:04 PM
To: Seth David Schoen <[email protected]>
Cc: [email protected] <[email protected]>; Bob Peraino <[email protected]>
Subject: Re: bug#79550: Grammar bug in grep
External Message: Use Caution
On 2025-10-01 12:16, Seth David Schoen wrote:
> $ echo 'hello{' | egrep '{'
> hello{
> $ echo 'hello[' | egrep '['
> grep: Invalid regular expression
You're right that it's inconsistent. However, it's what AT&T/Sun egrep
does. I just now confirmed this with Solaris 10 /usr/bin/egrep:
$ echo 'hello{' | egrep '{'
hello{
$ echo 'hello[' | egrep '['
egrep: syntax error
7th Edition Unix egrep did not treat '{' as a metacharacter, and I
suspect that when AT&T (or Sun?) added support for '{...}' they did not
want to break existing scripts that used '{' as an ordinary character.
When GNU grep was written, its developers didn't want to break existing
scripts that assumed AT&T/Sun behavior, so they copied this
inconsistency. And changing GNU grep's behavior now might break things.
At least GNU grep's behavior is documented for these corner cases. You
probably won't be so lucky with non-GNU grep.