Re: sed and delimiters that are also special characters to REs

Christoph Anton Mitterer via austin-group-l at The Open Group Tue, 11 Jan 2022 08:37:25 -0800

Hey Don.

On Mon, 2022-01-10 at 22:27 -0800, Don Cragun wrote:
>       * In a context address, the construction "\cREc",
>         where c is any character other than <backslash>
>         or <newline>, shall be identical to "/RE/". If
>         the character designated by c appears following
>         a <backslash>, then it shall be considered to be
>         that literal character, which shall not terminate
>         the RE. For example, in the context address
>         "\xabc\xdefx", the second x stands for itself, so
>         that the RE is "abcxdef".
> 
> Doesn't this answer your questions?

But that only applies to my "question" (1), which was actually less a
question, and more a: "it might be helpful, if even at the first
mentioning, it already also excludes <backslash> and <newline>.

OTOH, since replacing the delimiter there isn't just about replacing
it, but also escaping the first one with \ ... it's probably better to
just refer to the "Regular Expressions in sed" section as it's already
done now.

Yet still, I don't see how this would explain my questions 2a and 2b
(2c was actually not a question and should have been 3 instead).

The section you quote just refers to context addresses, whereas my
question is about s/y commands.

But even apart from that, it doesn't seem to mandate how the string is
parsed (my question 2a).

It says "If the character designated by c appears following a
<backslash>, then it shall be considered to be that literal character,
which shall not terminate the BRE." but what does "appears following"
mean?

If one would simply parse from left to right, then a context address:
\.\\..
would AFAIU appear as:
\. = context address delimiter #1
\\ = literal \
.  = context address delimiter #2
.  = error

But I couldn't find how the standard mandates doing it like that.
What prevents parsing it like that:
\. = context address starts, so:
1st: look for the next unquoted . which marks its end
     resulting in \\.
2nd: unescape any escaped delimiter, as that is to be taken literally
     (with respect to the RE)
     resulting in \.
RE: = \.
i.e. the literal . and not the special character

I'm not saying the 2nd parsing variant would make sense, I just dont'
understand what rules it out.

And your quoted section also doesn't seem to say anything with respect
to my 2b, i.e. what if a special character is used as delimiter.
E.g. it doesn't say something like:
You can do:
\...
but that will always be the RE = . (matching any character) and you
cannot get the literal . as if the RE would be \.  .

Neither does it seem to explain (or I just don't understand why ^^)
what should happen with e.g. the context address:
\.[.].

Is that parsed as:
\.   .    = context address
  [.]     = RE
and thus matching the literal . character (and not any character)
or as:
\.[.      = context address (there's a 2nd . not quoted by an \, so it
            should end
    ].    = error

Thanks,
Chris

Re: sed and delimiters that are also special characters to REs

Reply via email to