* doc/regex.texi (Back-reference Operator): Mention bugs etc. --- ChangeLog | 5 +++++ doc/regex.texi | 12 ++++++++++++ 2 files changed, 17 insertions(+)
diff --git a/ChangeLog b/ChangeLog index 6d7237d17..dccdf7a1a 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,8 @@ +2019-12-30 Paul Eggert <[email protected]> + + doc: document trouble with back-references + * doc/regex.texi (Back-reference Operator): Mention bugs etc. + 2019-12-29 Paul Eggert <[email protected]> doc: use “back-reference” for \1 etc. diff --git a/doc/regex.texi b/doc/regex.texi index 7b83cdd8e..4e0da9b39 100644 --- a/doc/regex.texi +++ b/doc/regex.texi @@ -1144,6 +1144,18 @@ example, @samp{(a(b))\2*} matches @samp{a} followed by two or more If there is no preceding @w{@var{digit}-th} subexpression, the regular expression is invalid. +Back-references can greatly slow down matching, as they can generate +exponentially many matching possibilities that can consume both time +and memory to explore. Also, the POSIX specification for +back-references is at times unclear. Furthermore, many regular +expression implementations have back-reference bugs that can cause +programs to return incorrect answers or even crash, and fixing these +bugs has often been low-priority---for example, as of 2019 the GNU C +library bug database contained back-reference bugs 52, 10844, 11053, +and 23522, with little sign of forthcoming fixes. Luckily, +back-references are rarely useful and it should be little trouble to +avoid them in practical applications. + @node Anchoring Operators @section Anchoring Operators -- 2.17.1
