CVSROOT: /webcvs/grep Module name: grep Changes by: Jim Meyering <meyering> 23/03/22 22:55:22
Index: html_node/Problematic-Expressions.html =================================================================== RCS file: /webcvs/grep/grep/manual/html_node/Problematic-Expressions.html,v retrieving revision 1.1 retrieving revision 1.2 diff -u -b -r1.1 -r1.2 --- html_node/Problematic-Expressions.html 3 Sep 2022 19:33:14 -0000 1.1 +++ html_node/Problematic-Expressions.html 23 Mar 2023 02:55:21 -0000 1.2 @@ -1,11 +1,11 @@ -<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> +<!DOCTYPE html> <html> -<!-- Created by GNU Texinfo 6.8, https://www.gnu.org/software/texinfo/ --> +<!-- Created by GNU Texinfo 7.0dev, https://www.gnu.org/software/texinfo/ --> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <!-- This manual is for grep, a pattern matching engine. -Copyright (C) 1999-2002, 2005, 2008-2022 Free Software Foundation, +Copyright © 1999-2002, 2005, 2008-2023 Free Software Foundation, Inc. Permission is granted to copy, distribute and/or modify this document @@ -14,10 +14,10 @@ Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License". --> -<title>Problematic Expressions (GNU Grep 3.8)</title> +<title>Problematic Expressions (GNU Grep 3.10)</title> -<meta name="description" content="Problematic Expressions (GNU Grep 3.8)"> -<meta name="keywords" content="Problematic Expressions (GNU Grep 3.8)"> +<meta name="description" content="Problematic Expressions (GNU Grep 3.10)"> +<meta name="keywords" content="Problematic Expressions (GNU Grep 3.10)"> <meta name="resource-type" content="document"> <meta name="distribution" content="global"> <meta name="Generator" content="makeinfo"> @@ -31,21 +31,9 @@ <link href="Basic-vs-Extended.html" rel="prev" title="Basic vs Extended"> <style type="text/css"> <!-- -a.copiable-anchor {visibility: hidden; text-decoration: none; line-height: 0em} -a.summary-letter {text-decoration: none} -blockquote.indentedblock {margin-right: 0em} -div.display {margin-left: 3.2em} -div.example {margin-left: 3.2em} -kbd {font-style: oblique} -pre.display {font-family: inherit} -pre.format {font-family: inherit} -pre.menu-comment {font-family: serif} -pre.menu-preformatted {font-family: serif} -span.nolinebreak {white-space: nowrap} -span.roman {font-family: initial; font-weight: normal} -span.sansserif {font-family: sans-serif; font-weight: normal} -span:hover a.copiable-anchor {visibility: visible} -ul.no-bullet {list-style: none} +a.copiable-link {visibility: hidden; text-decoration: none; line-height: 0em} +span:hover a.copiable-link {visibility: visible} +ul.mark-bullet {list-style-type: disc} --> </style> <link rel="stylesheet" type="text/css" href="https://www.gnu.org/software/gnulib/manual.css"> @@ -54,139 +42,139 @@ </head> <body lang="en"> -<div class="section" id="Problematic-Expressions"> -<div class="header"> +<div class="section-level-extent" id="Problematic-Expressions"> +<div class="nav-panel"> <p> Next: <a href="Character-Encoding.html" accesskey="n" rel="next">Character Encoding</a>, Previous: <a href="Basic-vs-Extended.html" accesskey="p" rel="prev">Basic vs Extended Regular Expressions</a>, Up: <a href="Regular-Expressions.html" accesskey="u" rel="up">Regular Expressions</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p> </div> <hr> -<span id="Problematic-Regular-Expressions"></span><h3 class="section">3.7 Problematic Regular Expressions</h3> +<h3 class="section" id="Problematic-Regular-Expressions"><span>3.7 Problematic Regular Expressions<a class="copiable-link" href="#Problematic-Regular-Expressions"> ¶</a></span></h3> -<span id="index-invalid-regular-expressions"></span> -<span id="index-unspecified-behavior-in-regular-expressions"></span> -<p>Some strings are <em>invalid regular expressions</em> and cause -<code>grep</code> to issue a diagnostic and fail. For example, ‘<samp>xy\1</samp>’ +<a class="index-entry-id" id="index-invalid-regular-expressions"></a> +<a class="index-entry-id" id="index-unspecified-behavior-in-regular-expressions"></a> +<p>Some strings are <em class="dfn">invalid regular expressions</em> and cause +<code class="command">grep</code> to issue a diagnostic and fail. For example, ‘<samp class="samp">xy\1</samp>’ is invalid because there is no parenthesized subexpression for the -back-reference ‘<samp>\1</samp>’ to refer to. +back-reference ‘<samp class="samp">\1</samp>’ to refer to. </p> -<p>Also, some regular expressions have <em>unspecified behavior</em> and -should be avoided even if <code>grep</code> does not currently diagnose -them. For example, ‘<samp>xy\0</samp>’ has unspecified behavior because -‘<samp>0</samp>’ is not a special character and ‘<samp>\0</samp>’ is not a special -backslash expression (see <a href="Special-Backslash-Expressions.html">Special Backslash Expressions</a>). +<p>Also, some regular expressions have <em class="dfn">unspecified behavior</em> and +should be avoided even if <code class="command">grep</code> does not currently diagnose +them. For example, ‘<samp class="samp">xy\0</samp>’ has unspecified behavior because +‘<samp class="samp">0</samp>’ is not a special character and ‘<samp class="samp">\0</samp>’ is not a special +backslash expression (see <a class="pxref" href="Special-Backslash-Expressions.html">Special Backslash Expressions</a>). Unspecified behavior can be particularly problematic because the set of matched strings might be only partially specified, or not be specified at all, or the expression might even be invalid. </p> <p>The following regular expression constructs are invalid on all platforms conforming to POSIX, so portable scripts can assume that -<code>grep</code> rejects these constructs: +<code class="command">grep</code> rejects these constructs: </p> -<ul> -<li> A basic regular expression containing a back-reference ‘<samp>\<var>n</var></samp>’ -preceded by fewer than <var>n</var> closing parentheses. For example, -‘<samp>\(a\)\2</samp>’ is invalid. - -</li><li> A bracket expression containing ‘<samp>[:</samp>’ that does not start a -character class; and similarly for ‘<samp>[=</samp>’ and ‘<samp>[.</samp>’. For -example, ‘<samp>[a[:b]</samp>’ and ‘<samp>[a[:ouch:]b]</samp>’ are invalid. +<ul class="itemize mark-bullet"> +<li>A basic regular expression containing a back-reference ‘<samp class="samp">\<var class="var">n</var></samp>’ +preceded by fewer than <var class="var">n</var> closing parentheses. For example, +‘<samp class="samp">\(a\)\2</samp>’ is invalid. + +</li><li>A bracket expression containing ‘<samp class="samp">[:</samp>’ that does not start a +character class; and similarly for ‘<samp class="samp">[=</samp>’ and ‘<samp class="samp">[.</samp>’. For +example, ‘<samp class="samp">[a[:b]</samp>’ and ‘<samp class="samp">[a[:ouch:]b]</samp>’ are invalid. </li></ul> -<p>GNU <code>grep</code> treats the following constructs as invalid. -However, other <code>grep</code> implementations might allow them, so +<p>GNU <code class="command">grep</code> treats the following constructs as invalid. +However, other <code class="command">grep</code> implementations might allow them, so portable scripts should not rely on their being invalid: </p> -<ul> -<li> Unescaped ‘<samp>\</samp>’ at the end of a regular expression. +<ul class="itemize mark-bullet"> +<li>Unescaped ‘<samp class="samp">\</samp>’ at the end of a regular expression. -</li><li> Unescaped ‘<samp>[</samp>’ that does not start a bracket expression. +</li><li>Unescaped ‘<samp class="samp">[</samp>’ that does not start a bracket expression. -</li><li> A ‘<samp>\{</samp>’ in a basic regular expression that does not start an +</li><li>A ‘<samp class="samp">\{</samp>’ in a basic regular expression that does not start an interval expression. -</li><li> A basic regular expression with unbalanced ‘<samp>\(</samp>’ or ‘<samp>\)</samp>’, -or an extended regular expression with unbalanced ‘<samp>(</samp>’. +</li><li>A basic regular expression with unbalanced ‘<samp class="samp">\(</samp>’ or ‘<samp class="samp">\)</samp>’, +or an extended regular expression with unbalanced ‘<samp class="samp">(</samp>’. -</li><li> In the POSIX locale, a range expression like ‘<samp>z-a</samp>’ that -represents zero elements. A non-GNU <code>grep</code> might treat it as +</li><li>In the POSIX locale, a range expression like ‘<samp class="samp">z-a</samp>’ that +represents zero elements. A non-GNU <code class="command">grep</code> might treat it as a valid range that never matches. -</li><li> An interval expression with a repetition count greater than 32767. +</li><li>An interval expression with a repetition count greater than 32767. (The portable POSIX limit is 255, and even interval expressions with smaller counts can be impractically slow on all known implementations.) -</li><li> A bracket expression that contains at least three elements, the first -and last of which are both ‘<samp>:</samp>’, or both ‘<samp>.</samp>’, or both -‘<samp>=</samp>’. For example, a non-GNU <code>grep</code> might treat -‘<samp>[:alpha:]</samp>’ like ‘<samp>[[:alpha:]]</samp>’, or like ‘<samp>[:ahlp]</samp>’. +</li><li>A bracket expression that contains at least three elements, the first +and last of which are both ‘<samp class="samp">:</samp>’, or both ‘<samp class="samp">.</samp>’, or both +‘<samp class="samp">=</samp>’. For example, a non-GNU <code class="command">grep</code> might treat +‘<samp class="samp">[:alpha:]</samp>’ like ‘<samp class="samp">[[:alpha:]]</samp>’, or like ‘<samp class="samp">[:ahlp]</samp>’. </li></ul> <p>The following constructs have well-defined behavior in GNU -<code>grep</code>. However, they have unspecified behavior elsewhere, so +<code class="command">grep</code>. However, they have unspecified behavior elsewhere, so portable scripts should avoid them: </p> -<ul> -<li> Special backslash expressions like ‘<samp>\b</samp>’, ‘<samp>\<</samp>’, and ‘<samp>\]</samp>’. -See <a href="Special-Backslash-Expressions.html">Special Backslash Expressions</a>. +<ul class="itemize mark-bullet"> +<li>Special backslash expressions like ‘<samp class="samp">\b</samp>’, ‘<samp class="samp">\<</samp>’, and ‘<samp class="samp">\]</samp>’. +See <a class="xref" href="Special-Backslash-Expressions.html">Special Backslash Expressions</a>. -</li><li> A basic regular expression that uses ‘<samp>\?</samp>’, ‘<samp>\+</samp>’, or ‘<samp>\|</samp>’. +</li><li>A basic regular expression that uses ‘<samp class="samp">\?</samp>’, ‘<samp class="samp">\+</samp>’, or ‘<samp class="samp">\|</samp>’. -</li><li> An extended regular expression that uses back-references. +</li><li>An extended regular expression that uses back-references. -</li><li> An empty regular expression, subexpression, or alternative. For -example, ‘<samp>(a|bc|)</samp>’ is not portable; a portable equivalent is -‘<samp>(a|bc)?</samp>’. +</li><li>An empty regular expression, subexpression, or alternative. For +example, ‘<samp class="samp">(a|bc|)</samp>’ is not portable; a portable equivalent is +‘<samp class="samp">(a|bc)?</samp>’. -</li><li> In a basic regular expression, an anchoring ‘<samp>^</samp>’ that appears -directly after ‘<samp>\(</samp>’, or an anchoring ‘<samp>$</samp>’ that appears -directly before ‘<samp>\)</samp>’. +</li><li>In a basic regular expression, an anchoring ‘<samp class="samp">^</samp>’ that appears +directly after ‘<samp class="samp">\(</samp>’, or an anchoring ‘<samp class="samp">$</samp>’ that appears +directly before ‘<samp class="samp">\)</samp>’. -</li><li> In a basic regular expression, a repetition operator that +</li><li>In a basic regular expression, a repetition operator that directly follows another repetition operator. -</li><li> In an extended regular expression, unescaped ‘<samp>{</samp>’ +</li><li>In an extended regular expression, unescaped ‘<samp class="samp">{</samp>’ that does not begin a valid interval expression. -GNU <code>grep</code> treats the ‘<samp>{</samp>’ as an ordinary character. +GNU <code class="command">grep</code> treats the ‘<samp class="samp">{</samp>’ as an ordinary character. -</li><li> A null character or an encoding error in either pattern or input data. -See <a href="Character-Encoding.html">Character Encoding</a>. +</li><li>A null character or an encoding error in either pattern or input data. +See <a class="xref" href="Character-Encoding.html">Character Encoding</a>. -</li><li> An input file that ends in a non-newline character, -where GNU <code>grep</code> silently supplies a newline. +</li><li>An input file that ends in a non-newline character, +where GNU <code class="command">grep</code> silently supplies a newline. </li></ul> <p>The following constructs have unspecified behavior, in both GNU -and other <code>grep</code> implementations. Scripts should avoid +and other <code class="command">grep</code> implementations. Scripts should avoid them whenever possible. </p> -<ul> -<li> A backslash escaping an ordinary character, unless it is a -back-reference like ‘<samp>\1</samp>’ or a special backslash expression like -‘<samp>\<</samp>’ or ‘<samp>\b</samp>’. See <a href="Special-Backslash-Expressions.html">Special Backslash Expressions</a>. For -example, ‘<samp>\x</samp>’ has unspecified behavior now, and a future version -of <code>grep</code> might specify ‘<samp>\x</samp>’ to have a new behavior. +<ul class="itemize mark-bullet"> +<li>A backslash escaping an ordinary character, unless it is a +back-reference like ‘<samp class="samp">\1</samp>’ or a special backslash expression like +‘<samp class="samp">\<</samp>’ or ‘<samp class="samp">\b</samp>’. See <a class="xref" href="Special-Backslash-Expressions.html">Special Backslash Expressions</a>. For +example, ‘<samp class="samp">\x</samp>’ has unspecified behavior now, and a future version +of <code class="command">grep</code> might specify ‘<samp class="samp">\x</samp>’ to have a new behavior. -</li><li> A repetition operator that appears directly after an anchor, or at the +</li><li>A repetition operator that appears directly after an anchor, or at the start of a complete regular expression, parenthesized subexpression, -or alternative. For example, ‘<samp>+|^*(+a|?-b)</samp>’ has unspecified -behavior, whereas ‘<samp>\+|^\*(\+a|\?-b)</samp>’ is portable. +or alternative. For example, ‘<samp class="samp">+|^*(+a|?-b)</samp>’ has unspecified +behavior, whereas ‘<samp class="samp">\+|^\*(\+a|\?-b)</samp>’ is portable. -</li><li> A range expression outside the POSIX locale. For example, in some -locales ‘<samp>[a-z]</samp>’ might match some characters that are not +</li><li>A range expression outside the POSIX locale. For example, in some +locales ‘<samp class="samp">[a-z]</samp>’ might match some characters that are not lowercase letters, or might not match some lowercase letters, or might -be invalid. With GNU <code>grep</code> it is not documented whether +be invalid. With GNU <code class="command">grep</code> it is not documented whether these range expressions use native code points, or use the collating -sequence specified by the <code>LC_COLLATE</code> category, or have some +sequence specified by the <code class="env">LC_COLLATE</code> category, or have some other interpretation. Outside the POSIX locale, it is portable to use -‘<samp>[[:lower:]]</samp>’ to match a lower-case letter, or -‘<samp>[abcdefghijklmnopqrstuvwxyz]</samp>’ to match an ASCII lower-case +‘<samp class="samp">[[:lower:]]</samp>’ to match a lower-case letter, or +‘<samp class="samp">[abcdefghijklmnopqrstuvwxyz]</samp>’ to match an ASCII lower-case letter. </li></ul> </div> <hr> -<div class="header"> +<div class="nav-panel"> <p> Next: <a href="Character-Encoding.html">Character Encoding</a>, Previous: <a href="Basic-vs-Extended.html">Basic vs Extended Regular Expressions</a>, Up: <a href="Regular-Expressions.html">Regular Expressions</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p> </div>
