Issue 154408
Summary [libc++] <regex>: Unmatched backrefs should always succeed in ECMAScript mode.
Labels libc++
Assignees
Reporter SainoNamkho
    ECMA-262, 1999, 15.10.2.9 states
> An escape sequence of the form `\` followed by a nonzero decimal number $n$
matches the result of the nth set of capturing parentheses (see 15.10.2.11). It is an error if the regular
_expression_ has fewer than $n$ capturing parentheses. If the regular _expression_ has $n$ or more capturing
parentheses but the nth one is **undefined** because it hasn't captured anything, then the backreference
always succeeds.

So this js code
``` js
for (let pattern of [/(\1)a/, /\1(a)/, /(a()|)\2a/, /(b()|)\2a/, /(b()|)\2/]) 
    if (pattern.test('a'))
 console.log(`${pattern} matches "a".`)
    else
 console.log(`${pattern} does not match "a".`)
```
prints
```
/(\1)a/ matches "a".
/\1(a)/ matches "a".
/(a()|)\2a/ matches "a".
/(b()|)\2a/ matches "a".
/(b()|)\2/ matches "a".
```

C++ implementations diverge https://godbolt.org/z/r51f4Wsas:
``` c++
#include <regex>
#include <print>

int main()
{
    for (auto pattern : {
 "(\\1)a",     "\\1(a)",
        "(a()|)\\2a", "(b()|)\\2",
    }) {
        try {
            if (std::regex_search("a", std::regex{pattern})
            )
 std::println("\x1b[32m/{}/ matches.\x1b[m", pattern);
 else
                std::println("\x1b[31m/{}/ does not match.\x1b[m", pattern);
        }
        catch (const std::regex_error& e) {
 if (e.code() == std::regex_constants::error_backref)
 std::println("\x1b[34m/{}/: {}\x1b[m", pattern, e.what());
 else
                throw;
        }
    }
}
```

libstdc++ and msstl rejects `(\1)a` and `\1(a)` as invalid regex, libc++ accepts `(\1)a`, I believe there're no invalid backrefs in ecma flavor, but this can be dicussed in another issue.

I'm intereted in that `(a()|)\2a` and `(b()|)\2` should match `a`.

[A POC fix](https://godbolt.org/z/x6xnKjx8f)
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to