Thanks for reporting this bug. The attached patches, which I plan to submit shortly, fix the problem.
On Thu, Oct 26, 2017 at 9:17 AM, Reuben Thomas <r...@sc3d.org> wrote: > The documentation as generated for findutils refers to "ed" syntax, > but does not define it. ("posix-basic" is given as a synonym for > "ed".) > > -- > https://rrt.sc3d.org > -- -- This email is intended solely for the use of its addressee, sender, and any readers of a mailing list archive in which it happens to appear. If you have received this email in error, please say or type three times, "I believe in the utility of email disclaimers," and then reply to the author correcting any spellings (and, optionally, any incorrect spellings), accompanying these with humorous jests about the author's parentage. If you are not the addressee, you are nevertheless permitted to both copy and forward this email since without such permissions email systems are unable to transmit email to anybody, intended recipient or not. To those still reading by this point, the author would like to apologise for being unable to maintain a consistent level of humour throughout this disclaimer. Contents may settle during transit. Do not feed the animals.
From e2c673cbcdc325a3a2e9dd02169bb4a42c61bc48 Mon Sep 17 00:00:00 2001 From: James Youngman <j...@gnu.org> Date: Mon, 13 Nov 2017 22:37:55 +0000 Subject: [PATCH 1/2] regexprops: fix dangling reference to the `ed' regular expression dialect. To: findutils-patc...@gnu.org * lib/regextype.c (regex_map): Permute the entries to list POSIX dialects before other ones, so that we don't end up with a dangling reference to `ed' regular expressions when context=findutils. Remove trailing white space from the output. * doc/regexprops.texi: Regenerate this file, so that we no longer have a dangling reference to the `ed' dialect. * doc/find.texi (Regular Expressions): Point out the difference between Emacs regular expressions and findutils regular expressions: in findutils "." will match newline. * find/find.1: Likewise. * locate/locate.1: Likewise. Also document the --regextype option. --- doc/find.texi | 7 +- doc/regexprops.texi | 376 +++++++++++++++++++++++++++++++++------------------- find/find.1 | 4 +- lib/regexprops.c | 74 +++++------ lib/regextype.c | 14 +- locate/locate.1 | 14 +- 6 files changed, 306 insertions(+), 183 deletions(-) diff --git a/doc/find.texi b/doc/find.texi index 2731f0af..5573d29b 100644 --- a/doc/find.texi +++ b/doc/find.texi @@ -3917,8 +3917,11 @@ your locale setup affects the interpretation of regular expressions. There are also several different types of regular expression, and these are interpreted differently. Normally, the type of regular -expression used by @code{find} and @code{locate} is the same as is -used in GNU Emacs. Both programs provide an option which allows you +expression used by @code{find} and @code{locate} is almost identical to +that used in GNU Emacs. The single difference is that in @code{find} +and @code{locate}, a @samp{.} will match a newline character. + +Both @code{find} and @code{locate} provide an option which allows you to select an alternative regular expression syntax; for @code{find} this is the @samp{-regextype} option, and for @code{locate} this is the @samp{--regextype} option. diff --git a/doc/regexprops.texi b/doc/regexprops.texi index 8fee88ae..0229460e 100644 --- a/doc/regexprops.texi +++ b/doc/regexprops.texi @@ -11,15 +11,15 @@ @menu * findutils-default regular expression syntax:: +* posix-awk regular expression syntax:: +* posix-basic regular expression syntax:: +* posix-egrep regular expression syntax:: +* posix-extended regular expression syntax:: * awk regular expression syntax:: * egrep regular expression syntax:: * emacs regular expression syntax:: * gnu-awk regular expression syntax:: * grep regular expression syntax:: -* posix-awk regular expression syntax:: -* posix-basic regular expression syntax:: -* posix-egrep regular expression syntax:: -* posix-extended regular expression syntax:: @end menu @node findutils-default regular expression syntax @@ -44,6 +44,7 @@ matches a @samp{?}. Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are ignored. Within square brackets, @samp{\} is taken literally. Character classes are not supported, so for example you would need to use @samp{[0-9]} instead of @samp{[[:digit:]]}. + GNU extensions are supported: @enumerate @@ -73,11 +74,10 @@ The alternation operator is @samp{\|}. The character @samp{^} only represents the beginning of a string when it appears: @enumerate -@item -At the beginning of a regular expression +@item At the beginning of a regular expression + +@item After an open-group, signified by @samp{\(} -@item After an open-group, signified by -@samp{\(} @item After the alternation operator @samp{\|} @@ -89,8 +89,8 @@ The character @samp{$} only represents the end of a string when it appears: @item At the end of a regular expression -@item Before a close-group, signified by -@samp{\)} +@item Before a close-group, signified by @samp{\)} + @item Before the alternation operator @samp{\|} @end enumerate @@ -101,8 +101,8 @@ The character @samp{$} only represents the end of a string when it appears: @item At the beginning of a regular expression -@item After an open-group, signified by -@samp{\(} +@item After an open-group, signified by @samp{\(} + @item After the alternation operator @samp{\|} @end enumerate @@ -113,8 +113,8 @@ The character @samp{$} only represents the end of a string when it appears: The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. -@node awk regular expression syntax -@subsection @samp{awk} regular expression syntax +@node posix-awk regular expression syntax +@subsection @samp{posix-awk} regular expression syntax The character @samp{.} matches any single character except the null character. @@ -135,53 +135,57 @@ matches a @samp{?}. Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} can be used to quote the following character. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. + GNU extensions are not supported and so @samp{\w}, @samp{\W}, @samp{\<}, @samp{\>}, @samp{\b}, @samp{\B}, @samp{\`}, and @samp{\'} match @samp{w}, @samp{W}, @samp{<}, @samp{>}, @samp{b}, @samp{B}, @samp{`}, and @samp{'} respectively. -Grouping is performed with parentheses @samp{()}. An unmatched @samp{)} matches just itself. A backslash followed by a digit matches that digit. + +Grouping is performed with parentheses @samp{()}. An unmatched @samp{)} matches just itself. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{(}. The alternation operator is @samp{|}. The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets. Within brackets, @samp{^} can be used to invert the membership of the character class being specified. -@samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except: + +@samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except the following places, where they are not allowed: @enumerate @item At the beginning of a regular expression -@item After an open-group, signified by -@samp{(} +@item After an open-group, signified by @samp{(} + @item After the alternation operator @samp{|} @end enumerate - +Intervals are specified by @samp{@{} and @samp{@}}. +Invalid intervals are treated as literals, for example @samp{a@{1} is treated as @samp{a\@{1} The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. -@node egrep regular expression syntax -@subsection @samp{egrep} regular expression syntax +@node posix-basic regular expression syntax +@subsection @samp{posix-basic} regular expression syntax -The character @samp{.} matches any single character. +The character @samp{.} matches any single character except the null character. @table @samp -@item + -indicates that the regular expression should match one or more occurrences of the previous atom or regexp. -@item ? -indicates that the regular expression should match zero or one occurrence of the previous atom or regexp. @item \+ -matches a @samp{+} +indicates that the regular expression should match one or more occurrences of the previous atom or regexp. @item \? -matches a @samp{?}. +indicates that the regular expression should match zero or one occurrence of the previous atom or regexp. +@item + and ? +match themselves. + @end table Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} is taken literally. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. + GNU extensions are supported: @enumerate @@ -204,24 +208,59 @@ GNU extensions are supported: @end enumerate -Grouping is performed with parentheses @samp{()}. An unmatched @samp{)} matches just itself. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{(}. +Grouping is performed with backslashes followed by parentheses @samp{\(}, @samp{\)}. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{\(}. -The alternation operator is @samp{|}. +The alternation operator is @samp{\|}. -The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets. Within brackets, @samp{^} can be used to invert the membership of the character class being specified. +The character @samp{^} only represents the beginning of a string when it appears: +@enumerate -The characters @samp{*}, @samp{+} and @samp{?} are special anywhere in a regular expression. +@item At the beginning of a regular expression + +@item After an open-group, signified by @samp{\(} + + +@item After the alternation operator @samp{\|} + +@end enumerate + + +The character @samp{$} only represents the end of a string when it appears: +@enumerate + +@item At the end of a regular expression + +@item Before a close-group, signified by @samp{\)} + +@item Before the alternation operator @samp{\|} + +@end enumerate + + +@samp{\*}, @samp{\+} and @samp{\?} are special at any point in a regular expression except: +@enumerate + +@item At the beginning of a regular expression + +@item After an open-group, signified by @samp{\(} + +@item After the alternation operator @samp{\|} + +@end enumerate + + +Intervals are specified by @samp{\@{} and @samp{\@}}. +Invalid intervals such as @samp{a\@{1z} are not accepted. -Intervals are specified by @samp{@{} and @samp{@}}. Invalid intervals are treated as literals, for example @samp{a@{1} is treated as @samp{a\@{1} The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. -@node emacs regular expression syntax -@subsection @samp{emacs} regular expression syntax +@node posix-egrep regular expression syntax +@subsection @samp{posix-egrep} regular expression syntax -The character @samp{.} matches any single character except newline. +The character @samp{.} matches any single character. @table @samp @@ -237,7 +276,8 @@ matches a @samp{?}. @end table -Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are ignored. Within square brackets, @samp{\} is taken literally. Character classes are not supported, so for example you would need to use @samp{[0-9]} instead of @samp{[[:digit:]]}. +Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} is taken literally. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. + GNU extensions are supported: @enumerate @@ -261,58 +301,27 @@ GNU extensions are supported: @end enumerate -Grouping is performed with backslashes followed by parentheses @samp{\(}, @samp{\)}. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{\(}. - -The alternation operator is @samp{\|}. - -The character @samp{^} only represents the beginning of a string when it appears: -@enumerate - -@item -At the beginning of a regular expression - -@item After an open-group, signified by -@samp{\(} - -@item After the alternation operator @samp{\|} - -@end enumerate - - -The character @samp{$} only represents the end of a string when it appears: -@enumerate - -@item At the end of a regular expression - -@item Before a close-group, signified by -@samp{\)} -@item Before the alternation operator @samp{\|} - -@end enumerate - - -@samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except: -@enumerate +Grouping is performed with parentheses @samp{()}. An unmatched @samp{)} matches just itself. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{(}. -@item At the beginning of a regular expression +The alternation operator is @samp{|}. -@item After an open-group, signified by -@samp{\(} -@item After the alternation operator @samp{\|} +The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets. Within brackets, @samp{^} can be used to invert the membership of the character class being specified. -@end enumerate +The characters @samp{*}, @samp{+} and @samp{?} are special anywhere in a regular expression. +Intervals are specified by @samp{@{} and @samp{@}}. +Invalid intervals are treated as literals, for example @samp{a@{1} is treated as @samp{a\@{1} The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. -@node gnu-awk regular expression syntax -@subsection @samp{gnu-awk} regular expression syntax +@node posix-extended regular expression syntax +@subsection @samp{posix-extended} regular expression syntax -The character @samp{.} matches any single character. +The character @samp{.} matches any single character except the null character. @table @samp @@ -328,7 +337,8 @@ matches a @samp{?}. @end table -Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} can be used to quote the following character. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. +Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} is taken literally. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. + GNU extensions are supported: @enumerate @@ -358,42 +368,101 @@ The alternation operator is @samp{|}. The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets. Within brackets, @samp{^} can be used to invert the membership of the character class being specified. -@samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except: + +@samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except the following places, where they are not allowed: @enumerate @item At the beginning of a regular expression -@item After an open-group, signified by -@samp{(} +@item After an open-group, signified by @samp{(} + @item After the alternation operator @samp{|} @end enumerate -Intervals are specified by @samp{@{} and @samp{@}}. Invalid intervals are treated as literals, for example @samp{a@{1} is treated as @samp{a\@{1} +Intervals are specified by @samp{@{} and @samp{@}}. +Invalid intervals such as @samp{a@{1z} are not accepted. + The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. -@node grep regular expression syntax -@subsection @samp{grep} regular expression syntax +@node awk regular expression syntax +@subsection @samp{awk} regular expression syntax -The character @samp{.} matches any single character. +The character @samp{.} matches any single character except the null character. @table @samp -@item \+ +@item + indicates that the regular expression should match one or more occurrences of the previous atom or regexp. +@item ? +indicates that the regular expression should match zero or one occurrence of the previous atom or regexp. +@item \+ +matches a @samp{+} @item \? +matches a @samp{?}. +@end table + + +Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} can be used to quote the following character. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. + + +GNU extensions are not supported and so @samp{\w}, @samp{\W}, @samp{\<}, @samp{\>}, @samp{\b}, @samp{\B}, @samp{\`}, and @samp{\'} match @samp{w}, @samp{W}, @samp{<}, @samp{>}, @samp{b}, @samp{B}, @samp{`}, and @samp{'} respectively. + + +Grouping is performed with parentheses @samp{()}. An unmatched @samp{)} matches just itself. A backslash followed by a digit matches that digit. + +The alternation operator is @samp{|}. + +The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets. Within brackets, @samp{^} can be used to invert the membership of the character class being specified. + + +@samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except: +@enumerate + +@item At the beginning of a regular expression + +@item After an open-group, signified by @samp{(} + +@item After the alternation operator @samp{|} + +@end enumerate + + + + +The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. + + +@node egrep regular expression syntax +@subsection @samp{egrep} regular expression syntax +This is a synonym for posix-egrep. +@node emacs regular expression syntax +@subsection @samp{emacs} regular expression syntax + + +The character @samp{.} matches any single character except newline. + + +@table @samp + +@item + +indicates that the regular expression should match one or more occurrences of the previous atom or regexp. +@item ? indicates that the regular expression should match zero or one occurrence of the previous atom or regexp. -@item + and ? -match themselves. +@item \+ +matches a @samp{+} +@item \? +matches a @samp{?}. @end table -Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} is taken literally. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. +Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are ignored. Within square brackets, @samp{\} is taken literally. Character classes are not supported, so for example you would need to use @samp{[0-9]} instead of @samp{[[:digit:]]}. + GNU extensions are supported: @enumerate @@ -424,13 +493,10 @@ The alternation operator is @samp{\|}. The character @samp{^} only represents the beginning of a string when it appears: @enumerate -@item -At the beginning of a regular expression +@item At the beginning of a regular expression -@item After an open-group, signified by -@samp{\(} +@item After an open-group, signified by @samp{\(} -@item After a newline @item After the alternation operator @samp{\|} @@ -442,39 +508,35 @@ The character @samp{$} only represents the end of a string when it appears: @item At the end of a regular expression -@item Before a close-group, signified by -@samp{\)} -@item Before a newline +@item Before a close-group, signified by @samp{\)} @item Before the alternation operator @samp{\|} @end enumerate -@samp{\*}, @samp{\+} and @samp{\?} are special at any point in a regular expression except: +@samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except: @enumerate @item At the beginning of a regular expression -@item After an open-group, signified by -@samp{\(} -@item After a newline +@item After an open-group, signified by @samp{\(} @item After the alternation operator @samp{\|} @end enumerate -Intervals are specified by @samp{\@{} and @samp{\@}}. Invalid intervals such as @samp{a\@{1z} are not accepted. + The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. -@node posix-awk regular expression syntax -@subsection @samp{posix-awk} regular expression syntax +@node gnu-awk regular expression syntax +@subsection @samp{gnu-awk} regular expression syntax -The character @samp{.} matches any single character except the null character. +The character @samp{.} matches any single character. @table @samp @@ -492,7 +554,28 @@ matches a @samp{?}. Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} can be used to quote the following character. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. -GNU extensions are not supported and so @samp{\w}, @samp{\W}, @samp{\<}, @samp{\>}, @samp{\b}, @samp{\B}, @samp{\`}, and @samp{\'} match @samp{w}, @samp{W}, @samp{<}, @samp{>}, @samp{b}, @samp{B}, @samp{`}, and @samp{'} respectively. + +GNU extensions are supported: +@enumerate + +@item @samp{\w} matches a character within a word + +@item @samp{\W} matches a character which is not within a word + +@item @samp{\<} matches the beginning of a word + +@item @samp{\>} matches the end of a word + +@item @samp{\b} matches a word boundary + +@item @samp{\B} matches characters which are not a word boundary + +@item @samp{\`} matches the beginning of the whole input + +@item @samp{\'} matches the end of the whole input + +@end enumerate + Grouping is performed with parentheses @samp{()}. An unmatched @samp{)} matches just itself. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{(}. @@ -500,51 +583,47 @@ The alternation operator is @samp{|}. The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets. Within brackets, @samp{^} can be used to invert the membership of the character class being specified. -@samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except the following places, where they are not allowed: + +@samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except: @enumerate @item At the beginning of a regular expression -@item After an open-group, signified by -@samp{(} +@item After an open-group, signified by @samp{(} + @item After the alternation operator @samp{|} @end enumerate -Intervals are specified by @samp{@{} and @samp{@}}. Invalid intervals are treated as literals, for example @samp{a@{1} is treated as @samp{a\@{1} +Intervals are specified by @samp{@{} and @samp{@}}. +Invalid intervals are treated as literals, for example @samp{a@{1} is treated as @samp{a\@{1} The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. -@node posix-basic regular expression syntax -@subsection @samp{posix-basic} regular expression syntax -This is a synonym for ed. -@node posix-egrep regular expression syntax -@subsection @samp{posix-egrep} regular expression syntax -This is a synonym for egrep. -@node posix-extended regular expression syntax -@subsection @samp{posix-extended} regular expression syntax +@node grep regular expression syntax +@subsection @samp{grep} regular expression syntax -The character @samp{.} matches any single character except the null character. +The character @samp{.} matches any single character. @table @samp -@item + -indicates that the regular expression should match one or more occurrences of the previous atom or regexp. -@item ? -indicates that the regular expression should match zero or one occurrence of the previous atom or regexp. @item \+ -matches a @samp{+} +indicates that the regular expression should match one or more occurrences of the previous atom or regexp. @item \? -matches a @samp{?}. +indicates that the regular expression should match zero or one occurrence of the previous atom or regexp. +@item + and ? +match themselves. + @end table Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} is taken literally. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. + GNU extensions are supported: @enumerate @@ -567,25 +646,56 @@ GNU extensions are supported: @end enumerate -Grouping is performed with parentheses @samp{()}. An unmatched @samp{)} matches just itself. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{(}. +Grouping is performed with backslashes followed by parentheses @samp{\(}, @samp{\)}. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{\(}. -The alternation operator is @samp{|}. +The alternation operator is @samp{\|}. -The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets. Within brackets, @samp{^} can be used to invert the membership of the character class being specified. +The character @samp{^} only represents the beginning of a string when it appears: +@enumerate -@samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except the following places, where they are not allowed: +@item At the beginning of a regular expression + +@item After an open-group, signified by @samp{\(} + + +@item After a newline + +@item After the alternation operator @samp{\|} + +@end enumerate + + +The character @samp{$} only represents the end of a string when it appears: +@enumerate + +@item At the end of a regular expression + +@item Before a close-group, signified by @samp{\)} + +@item Before a newline + +@item Before the alternation operator @samp{\|} + +@end enumerate + + +@samp{\*}, @samp{\+} and @samp{\?} are special at any point in a regular expression except: @enumerate @item At the beginning of a regular expression -@item After an open-group, signified by -@samp{(} -@item After the alternation operator @samp{|} +@item After an open-group, signified by @samp{\(} + +@item After a newline + +@item After the alternation operator @samp{\|} @end enumerate -Intervals are specified by @samp{@{} and @samp{@}}. Invalid intervals such as @samp{a@{1z} are not accepted. +Intervals are specified by @samp{\@{} and @samp{\@}}. +Invalid intervals such as @samp{a\@{1z} are not accepted. + The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. diff --git a/find/find.1 b/find/find.1 index 06ddfa5b..8b1320c1 100644 --- a/find/find.1 +++ b/find/find.1 @@ -879,8 +879,8 @@ on the whole path, not a search. For example, to match a file named `./fubar3', you can use the regular expression `.*bar.' or `.*b.*3', but not `f.*r3'. The regular expressions understood by .B find -are by default Emacs Regular Expressions, but this can be -changed with the +are by default Emacs Regular Expressions (except that `.' matches +newline), but this can be changed with the .B \-regextype option. diff --git a/lib/regexprops.c b/lib/regexprops.c index fcbdd5db..b20b4a38 100644 --- a/lib/regexprops.c +++ b/lib/regexprops.c @@ -78,8 +78,12 @@ directive (const char *s) static void comment (const char *s) { - directive ("@c "); - literal (s); + directive ("@c"); + if (s[0]) + { + literal (" "); + literal (s); + } newline (); } @@ -175,7 +179,7 @@ describe_regex_syntax (int options) content (" the null character"); } - content (". "); + content ("."); newpara (); if (!(options & RE_LIMITED_OPS)) @@ -185,25 +189,25 @@ describe_regex_syntax (int options) { enum_item ("\\+"); content ("indicates that the regular expression should match one" - " or more occurrences of the previous atom or regexp. "); + " or more occurrences of the previous atom or regexp."); enum_item ("\\?"); content ("indicates that the regular expression should match zero" - " or one occurrence of the previous atom or regexp. "); - enum_item ("+ and ? "); - content ("match themselves. "); + " or one occurrence of the previous atom or regexp."); + enum_item ("+ and ?"); + content ("match themselves.\n"); } else { enum_item ("+"); content ("indicates that the regular expression should match one" - " or more occurrences of the previous atom or regexp. "); + " or more occurrences of the previous atom or regexp."); enum_item ("?"); content ("indicates that the regular expression should match zero" - " or one occurrence of the previous atom or regexp. "); + " or one occurrence of the previous atom or regexp."); enum_item ("\\+"); literal ("matches a @samp{+}"); enum_item ("\\?"); - literal ("matches a @samp{?}. "); + literal ("matches a @samp{?}."); } endtable (); } @@ -226,15 +230,15 @@ describe_regex_syntax (int options) if (options & RE_CHAR_CLASSES) content ("Character classes are supported; for example " - "@samp{[[:digit:]]} will match a single decimal digit. "); + "@samp{[[:digit:]]} will match a single decimal digit.\n"); else literal ("Character classes are not supported, so for example " "you would need to use @samp{[0-9]} " - "instead of @samp{[[:digit:]]}. "); + "instead of @samp{[[:digit:]]}.\n"); if (options & RE_HAT_LISTS_NOT_NEWLINE) { - literal ("Non-matching lists @samp{[^@dots{}]} do not ever match newline. "); + literal ("Non-matching lists @samp{[^@dots{}]} do not ever match newline.\n"); } newpara (); if (options & RE_NO_GNU_OPS) @@ -242,7 +246,7 @@ describe_regex_syntax (int options) content ("GNU extensions are not supported and so " "@samp{\\w}, @samp{\\W}, @samp{\\<}, @samp{\\>}, @samp{\\b}, @samp{\\B}, @samp{\\`}, and @samp{\\'} " "match " - "@samp{w}, @samp{W}, @samp{<}, @samp{>}, @samp{b}, @samp{B}, @samp{`}, and @samp{'} respectively. "); + "@samp{w}, @samp{W}, @samp{<}, @samp{>}, @samp{b}, @samp{B}, @samp{`}, and @samp{'} respectively.\n"); } else { @@ -276,7 +280,7 @@ describe_regex_syntax (int options) if (options & RE_NO_BK_REFS) { - content ("A backslash followed by a digit matches that digit. "); + content ("A backslash followed by a digit matches that digit."); } else { @@ -285,7 +289,7 @@ describe_regex_syntax (int options) literal ("@samp{(}"); else literal ("@samp{\\(}"); - content (". "); + content ("."); } @@ -293,29 +297,28 @@ describe_regex_syntax (int options) if (!(options & RE_LIMITED_OPS)) { if (options & RE_NO_BK_VBAR) - literal ("The alternation operator is @samp{|}. "); + literal ("The alternation operator is @samp{|}."); else - literal ("The alternation operator is @samp{\\|}. "); + literal ("The alternation operator is @samp{\\|}."); } newpara (); if (options & RE_CONTEXT_INDEP_ANCHORS) { - literal ("The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets. Within brackets, @samp{^} can be used to invert the membership of the character class being specified. "); + literal ("The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets. Within brackets, @samp{^} can be used to invert the membership of the character class being specified.\n"); } else { literal ("The character @samp{^} only represents the beginning of a string when it appears:"); beginenum (); - enum_item ("\nAt the beginning of a regular expression"); - enum_item ("After an open-group, signified by "); + enum_item ("At the beginning of a regular expression"); if (options & RE_NO_BK_PARENS) { - literal ("@samp{(}"); + enum_item ("After an open-group, signified by @samp{(}"); } else { - literal ("@samp{\\(}"); + enum_item ("After an open-group, signified by @samp{\\(}"); } newline (); if (!(options & RE_LIMITED_OPS)) @@ -334,14 +337,13 @@ describe_regex_syntax (int options) literal ("The character @samp{$} only represents the end of a string when it appears:"); beginenum (); enum_item ("At the end of a regular expression"); - enum_item ("Before a close-group, signified by "); if (options & RE_NO_BK_PARENS) { - literal ("@samp{)}"); + enum_item ("Before a close-group, signified by @samp{)}"); } else { - literal ("@samp{\\)}"); + enum_item ("Before a close-group, signified by @samp{\\)}"); } if (!(options & RE_LIMITED_OPS)) { @@ -361,7 +363,7 @@ describe_regex_syntax (int options) if ((options & RE_CONTEXT_INDEP_OPS) && !(options & RE_CONTEXT_INVALID_OPS)) { - literal ("The characters @samp{*}, @samp{+} and @samp{?} are special anywhere in a regular expression. "); + literal ("The characters @samp{*}, @samp{+} and @samp{?} are special anywhere in a regular expression.\n"); } else { @@ -381,14 +383,13 @@ describe_regex_syntax (int options) beginenum (); enum_item ("At the beginning of a regular expression"); - enum_item ("After an open-group, signified by "); if (options & RE_NO_BK_PARENS) { - literal ("@samp{(}"); + enum_item ("After an open-group, signified by @samp{(}"); } else { - literal ("@samp{\\(}"); + enum_item ("After an open-group, signified by @samp{\\(}"); } if (!(options & RE_LIMITED_OPS)) { @@ -410,39 +411,38 @@ describe_regex_syntax (int options) { if (options & RE_NO_BK_BRACES) { - literal ("Intervals are specified by @samp{@{} and @samp{@}}. "); + literal ("Intervals are specified by @samp{@{} and @samp{@}}.\n"); if (options & RE_INVALID_INTERVAL_ORD) { literal ("Invalid intervals are treated as literals, for example @samp{a@{1} is treated as @samp{a\\@{1}"); } else { - literal ("Invalid intervals such as @samp{a@{1z} are not accepted. "); + literal ("Invalid intervals such as @samp{a@{1z} are not accepted.\n"); } } else { - literal ("Intervals are specified by @samp{\\@{} and @samp{\\@}}. "); + literal ("Intervals are specified by @samp{\\@{} and @samp{\\@}}.\n"); if (options & RE_INVALID_INTERVAL_ORD) { literal ("Invalid intervals are treated as literals, for example @samp{a\\@{1} is treated as @samp{a@{1}"); } else { - literal ("Invalid intervals such as @samp{a\\@{1z} are not accepted. "); + literal ("Invalid intervals such as @samp{a\\@{1z} are not accepted.\n"); } } - } newpara (); if (options & RE_NO_POSIX_BACKTRACKING) { - content ("Matching succeeds as soon as the whole pattern is matched, meaning that the result may not be the longest possible match. "); + content ("Matching succeeds as soon as the whole pattern is matched, meaning that the result may not be the longest possible match."); } else { - content ("The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. "); + content ("The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups."); } newpara (); } diff --git a/lib/regextype.c b/lib/regextype.c index 8a7347dc..89416ebd 100644 --- a/lib/regextype.c +++ b/lib/regextype.c @@ -56,17 +56,19 @@ struct tagRegexTypeMap struct tagRegexTypeMap regex_map[] = { { "findutils-default", CONTEXT_FINDUTILS, RE_SYNTAX_EMACS|RE_DOT_NEWLINE }, + + { "posix-awk", CONTEXT_ALL, RE_SYNTAX_POSIX_AWK }, + { "posix-basic", CONTEXT_ALL, RE_SYNTAX_POSIX_BASIC }, + { "posix-egrep", CONTEXT_ALL, RE_SYNTAX_POSIX_EGREP }, + { "posix-extended", CONTEXT_ALL, RE_SYNTAX_POSIX_EXTENDED }, + { "posix-minimal-basic", CONTEXT_GENERIC, RE_SYNTAX_POSIX_MINIMAL_BASIC }, + { "awk", CONTEXT_ALL, RE_SYNTAX_AWK }, - { "egrep", CONTEXT_ALL, RE_SYNTAX_EGREP }, { "ed", CONTEXT_GENERIC, RE_SYNTAX_ED }, + { "egrep", CONTEXT_ALL, RE_SYNTAX_EGREP }, { "emacs", CONTEXT_ALL, RE_SYNTAX_EMACS }, { "gnu-awk", CONTEXT_ALL, RE_SYNTAX_GNU_AWK }, { "grep", CONTEXT_ALL, RE_SYNTAX_GREP }, - { "posix-awk", CONTEXT_ALL, RE_SYNTAX_POSIX_AWK }, - { "posix-basic", CONTEXT_ALL, RE_SYNTAX_POSIX_BASIC }, - { "posix-egrep", CONTEXT_ALL, RE_SYNTAX_POSIX_EGREP }, - { "posix-extended", CONTEXT_ALL, RE_SYNTAX_POSIX_EXTENDED }, - { "posix-minimal-basic", CONTEXT_GENERIC, RE_SYNTAX_POSIX_MINIMAL_BASIC }, { "sed", CONTEXT_GENERIC, RE_SYNTAX_SED }, /* ,{ "posix-common", CONTEXT_GENERIC, _RE_SYNTAX_POSIX_COMMON } */ }; diff --git a/locate/locate.1 b/locate/locate.1 index 82c8bc5a..7a1a0bea 100644 --- a/locate/locate.1 +++ b/locate/locate.1 @@ -13,6 +13,7 @@ locate \- list files in databases that match a pattern [\-l N | \-\-limit=N] [\-S | \-\-statistics] [\-r | \-\-regex ] +[\-\-regextype R] [\-\-max-database-age D] [\-P | \-H | \-\-nofollow] [\-L | \-\-follow] @@ -168,15 +169,22 @@ The pattern specified on the command line is understood to be a regular expression, as opposed to a glob pattern. The Regular expressions work in the same was as in .B emacs -and -.BR find , -except for the fact that "." will match a newline. +except for the fact that "." will match a newline. GNU +.B find +uses the same regular expressions. Filenames whose full paths match the specified regular expression are printed (or, in the case of the \-c option, counted). If you wish to anchor your regular expression at the ends of the full path name, then as is usual with regular expressions, you should use the characters ^ and $ to signify this. .TP +.I "\-\-regextype R" +Use regular expression dialect R. Supported dialects +include `findutils-default', `posix-awk', `posix-basic', +`posix-egrep', `posix-extended', `posix-minimal-basic', `awk', `ed', +`egrep', `emacs', `gnu-awk', `grep' and `sed'. See the Texinfo +documentation for a detailed explanation of these dialects. +.TP .I "\-s, \-\-stdio" Accepted but does nothing, for compatibility with BSD .BR locate . -- 2.11.0
From 1b53838ddf91013af970fee3ca19c12535a4ca91 Mon Sep 17 00:00:00 2001 From: James Youngman <j...@gnu.org> Date: Tue, 14 Nov 2017 22:21:49 +0000 Subject: [PATCH 2/2] regexprops: don't mention regex dialects we're not going to document. To: findutils-patc...@gnu.org * lib/regextype.c (get_regex_type_synonym): don't return regex dialect Y as a synonym of dialect X, if we're not in fact going to include X. Accept a CONTEXT parameter in order to identify this situation. This ensures that the bug fixed in commit e2c673cbcdc325a3a2e9dd02169bb4a42c61bc48 stays fixed for any permutation of regex_map. * lib/regextype.h: update prototype of get_regex_type_synonym. * lib/regexprops.c (describe_all): Pass the new context parameter. * doc/regexprops.texi: regenerate this file. --- doc/regexprops.texi | 296 ++++++++++++++++++++++++++-------------------------- lib/regexprops.c | 2 +- lib/regextype.c | 28 +++-- lib/regextype.h | 7 +- 4 files changed, 170 insertions(+), 163 deletions(-) diff --git a/doc/regexprops.texi b/doc/regexprops.texi index 0229460e..94c1e2e7 100644 --- a/doc/regexprops.texi +++ b/doc/regexprops.texi @@ -11,15 +11,15 @@ @menu * findutils-default regular expression syntax:: +* emacs regular expression syntax:: +* gnu-awk regular expression syntax:: +* grep regular expression syntax:: * posix-awk regular expression syntax:: +* awk regular expression syntax:: * posix-basic regular expression syntax:: * posix-egrep regular expression syntax:: -* posix-extended regular expression syntax:: -* awk regular expression syntax:: * egrep regular expression syntax:: -* emacs regular expression syntax:: -* gnu-awk regular expression syntax:: -* grep regular expression syntax:: +* posix-extended regular expression syntax:: @end menu @node findutils-default regular expression syntax @@ -113,11 +113,11 @@ The character @samp{$} only represents the end of a string when it appears: The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. -@node posix-awk regular expression syntax -@subsection @samp{posix-awk} regular expression syntax +@node emacs regular expression syntax +@subsection @samp{emacs} regular expression syntax -The character @samp{.} matches any single character except the null character. +The character @samp{.} matches any single character except newline. @table @samp @@ -133,57 +133,7 @@ matches a @samp{?}. @end table -Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} can be used to quote the following character. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. - - -GNU extensions are not supported and so @samp{\w}, @samp{\W}, @samp{\<}, @samp{\>}, @samp{\b}, @samp{\B}, @samp{\`}, and @samp{\'} match @samp{w}, @samp{W}, @samp{<}, @samp{>}, @samp{b}, @samp{B}, @samp{`}, and @samp{'} respectively. - - -Grouping is performed with parentheses @samp{()}. An unmatched @samp{)} matches just itself. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{(}. - -The alternation operator is @samp{|}. - -The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets. Within brackets, @samp{^} can be used to invert the membership of the character class being specified. - - -@samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except the following places, where they are not allowed: -@enumerate - -@item At the beginning of a regular expression - -@item After an open-group, signified by @samp{(} - -@item After the alternation operator @samp{|} - -@end enumerate - - -Intervals are specified by @samp{@{} and @samp{@}}. -Invalid intervals are treated as literals, for example @samp{a@{1} is treated as @samp{a\@{1} - -The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. - - -@node posix-basic regular expression syntax -@subsection @samp{posix-basic} regular expression syntax - - -The character @samp{.} matches any single character except the null character. - - -@table @samp - -@item \+ -indicates that the regular expression should match one or more occurrences of the previous atom or regexp. -@item \? -indicates that the regular expression should match zero or one occurrence of the previous atom or regexp. -@item + and ? -match themselves. - -@end table - - -Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} is taken literally. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. +Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are ignored. Within square brackets, @samp{\} is taken literally. Character classes are not supported, so for example you would need to use @samp{[0-9]} instead of @samp{[[:digit:]]}. GNU extensions are supported: @@ -237,7 +187,7 @@ The character @samp{$} only represents the end of a string when it appears: @end enumerate -@samp{\*}, @samp{\+} and @samp{\?} are special at any point in a regular expression except: +@samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except: @enumerate @item At the beginning of a regular expression @@ -249,15 +199,13 @@ The character @samp{$} only represents the end of a string when it appears: @end enumerate -Intervals are specified by @samp{\@{} and @samp{\@}}. -Invalid intervals such as @samp{a\@{1z} are not accepted. The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. -@node posix-egrep regular expression syntax -@subsection @samp{posix-egrep} regular expression syntax +@node gnu-awk regular expression syntax +@subsection @samp{gnu-awk} regular expression syntax The character @samp{.} matches any single character. @@ -276,7 +224,7 @@ matches a @samp{?}. @end table -Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} is taken literally. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. +Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} can be used to quote the following character. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. GNU extensions are supported: @@ -308,7 +256,16 @@ The alternation operator is @samp{|}. The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets. Within brackets, @samp{^} can be used to invert the membership of the character class being specified. -The characters @samp{*}, @samp{+} and @samp{?} are special anywhere in a regular expression. +@samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except: +@enumerate + +@item At the beginning of a regular expression + +@item After an open-group, signified by @samp{(} + +@item After the alternation operator @samp{|} + +@end enumerate Intervals are specified by @samp{@{} and @samp{@}}. @@ -317,23 +274,22 @@ Invalid intervals are treated as literals, for example @samp{a@{1} is treated as The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. -@node posix-extended regular expression syntax -@subsection @samp{posix-extended} regular expression syntax +@node grep regular expression syntax +@subsection @samp{grep} regular expression syntax -The character @samp{.} matches any single character except the null character. +The character @samp{.} matches any single character. @table @samp -@item + -indicates that the regular expression should match one or more occurrences of the previous atom or regexp. -@item ? -indicates that the regular expression should match zero or one occurrence of the previous atom or regexp. @item \+ -matches a @samp{+} +indicates that the regular expression should match one or more occurrences of the previous atom or regexp. @item \? -matches a @samp{?}. +indicates that the regular expression should match zero or one occurrence of the previous atom or regexp. +@item + and ? +match themselves. + @end table @@ -362,6 +318,86 @@ GNU extensions are supported: @end enumerate +Grouping is performed with backslashes followed by parentheses @samp{\(}, @samp{\)}. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{\(}. + +The alternation operator is @samp{\|}. + +The character @samp{^} only represents the beginning of a string when it appears: +@enumerate + +@item At the beginning of a regular expression + +@item After an open-group, signified by @samp{\(} + + +@item After a newline + +@item After the alternation operator @samp{\|} + +@end enumerate + + +The character @samp{$} only represents the end of a string when it appears: +@enumerate + +@item At the end of a regular expression + +@item Before a close-group, signified by @samp{\)} + +@item Before a newline + +@item Before the alternation operator @samp{\|} + +@end enumerate + + +@samp{\*}, @samp{\+} and @samp{\?} are special at any point in a regular expression except: +@enumerate + +@item At the beginning of a regular expression + +@item After an open-group, signified by @samp{\(} + +@item After a newline + +@item After the alternation operator @samp{\|} + +@end enumerate + + +Intervals are specified by @samp{\@{} and @samp{\@}}. +Invalid intervals such as @samp{a\@{1z} are not accepted. + + +The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. + + +@node posix-awk regular expression syntax +@subsection @samp{posix-awk} regular expression syntax + + +The character @samp{.} matches any single character except the null character. + + +@table @samp + +@item + +indicates that the regular expression should match one or more occurrences of the previous atom or regexp. +@item ? +indicates that the regular expression should match zero or one occurrence of the previous atom or regexp. +@item \+ +matches a @samp{+} +@item \? +matches a @samp{?}. +@end table + + +Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} can be used to quote the following character. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. + + +GNU extensions are not supported and so @samp{\w}, @samp{\W}, @samp{\<}, @samp{\>}, @samp{\b}, @samp{\B}, @samp{\`}, and @samp{\'} match @samp{w}, @samp{W}, @samp{<}, @samp{>}, @samp{b}, @samp{B}, @samp{`}, and @samp{'} respectively. + + Grouping is performed with parentheses @samp{()}. An unmatched @samp{)} matches just itself. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{(}. The alternation operator is @samp{|}. @@ -382,8 +418,7 @@ The characters @samp{^} and @samp{$} always represent the beginning and end of a Intervals are specified by @samp{@{} and @samp{@}}. -Invalid intervals such as @samp{a@{1z} are not accepted. - +Invalid intervals are treated as literals, for example @samp{a@{1} is treated as @samp{a\@{1} The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. @@ -438,30 +473,26 @@ The characters @samp{^} and @samp{$} always represent the beginning and end of a The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. -@node egrep regular expression syntax -@subsection @samp{egrep} regular expression syntax -This is a synonym for posix-egrep. -@node emacs regular expression syntax -@subsection @samp{emacs} regular expression syntax +@node posix-basic regular expression syntax +@subsection @samp{posix-basic} regular expression syntax -The character @samp{.} matches any single character except newline. +The character @samp{.} matches any single character except the null character. @table @samp -@item + -indicates that the regular expression should match one or more occurrences of the previous atom or regexp. -@item ? -indicates that the regular expression should match zero or one occurrence of the previous atom or regexp. @item \+ -matches a @samp{+} +indicates that the regular expression should match one or more occurrences of the previous atom or regexp. @item \? -matches a @samp{?}. +indicates that the regular expression should match zero or one occurrence of the previous atom or regexp. +@item + and ? +match themselves. + @end table -Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are ignored. Within square brackets, @samp{\} is taken literally. Character classes are not supported, so for example you would need to use @samp{[0-9]} instead of @samp{[[:digit:]]}. +Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} is taken literally. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. GNU extensions are supported: @@ -515,7 +546,7 @@ The character @samp{$} only represents the end of a string when it appears: @end enumerate -@samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except: +@samp{\*}, @samp{\+} and @samp{\?} are special at any point in a regular expression except: @enumerate @item At the beginning of a regular expression @@ -527,13 +558,15 @@ The character @samp{$} only represents the end of a string when it appears: @end enumerate +Intervals are specified by @samp{\@{} and @samp{\@}}. +Invalid intervals such as @samp{a\@{1z} are not accepted. The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. -@node gnu-awk regular expression syntax -@subsection @samp{gnu-awk} regular expression syntax +@node posix-egrep regular expression syntax +@subsection @samp{posix-egrep} regular expression syntax The character @samp{.} matches any single character. @@ -552,7 +585,7 @@ matches a @samp{?}. @end table -Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} can be used to quote the following character. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. +Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} is taken literally. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. GNU extensions are supported: @@ -584,16 +617,7 @@ The alternation operator is @samp{|}. The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets. Within brackets, @samp{^} can be used to invert the membership of the character class being specified. -@samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except: -@enumerate - -@item At the beginning of a regular expression - -@item After an open-group, signified by @samp{(} - -@item After the alternation operator @samp{|} - -@end enumerate +The characters @samp{*}, @samp{+} and @samp{?} are special anywhere in a regular expression. Intervals are specified by @samp{@{} and @samp{@}}. @@ -602,22 +626,26 @@ Invalid intervals are treated as literals, for example @samp{a@{1} is treated as The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. -@node grep regular expression syntax -@subsection @samp{grep} regular expression syntax +@node egrep regular expression syntax +@subsection @samp{egrep} regular expression syntax +This is a synonym for posix-egrep. +@node posix-extended regular expression syntax +@subsection @samp{posix-extended} regular expression syntax -The character @samp{.} matches any single character. +The character @samp{.} matches any single character except the null character. @table @samp -@item \+ +@item + indicates that the regular expression should match one or more occurrences of the previous atom or regexp. -@item \? +@item ? indicates that the regular expression should match zero or one occurrence of the previous atom or regexp. -@item + and ? -match themselves. - +@item \+ +matches a @samp{+} +@item \? +matches a @samp{?}. @end table @@ -646,55 +674,27 @@ GNU extensions are supported: @end enumerate -Grouping is performed with backslashes followed by parentheses @samp{\(}, @samp{\)}. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{\(}. - -The alternation operator is @samp{\|}. - -The character @samp{^} only represents the beginning of a string when it appears: -@enumerate - -@item At the beginning of a regular expression - -@item After an open-group, signified by @samp{\(} - - -@item After a newline - -@item After the alternation operator @samp{\|} - -@end enumerate - - -The character @samp{$} only represents the end of a string when it appears: -@enumerate - -@item At the end of a regular expression - -@item Before a close-group, signified by @samp{\)} - -@item Before a newline +Grouping is performed with parentheses @samp{()}. An unmatched @samp{)} matches just itself. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{(}. -@item Before the alternation operator @samp{\|} +The alternation operator is @samp{|}. -@end enumerate +The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets. Within brackets, @samp{^} can be used to invert the membership of the character class being specified. -@samp{\*}, @samp{\+} and @samp{\?} are special at any point in a regular expression except: +@samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except the following places, where they are not allowed: @enumerate @item At the beginning of a regular expression -@item After an open-group, signified by @samp{\(} - -@item After a newline +@item After an open-group, signified by @samp{(} -@item After the alternation operator @samp{\|} +@item After the alternation operator @samp{|} @end enumerate -Intervals are specified by @samp{\@{} and @samp{\@}}. -Invalid intervals such as @samp{a\@{1z} are not accepted. +Intervals are specified by @samp{@{} and @samp{@}}. +Invalid intervals such as @samp{a@{1z} are not accepted. The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. diff --git a/lib/regexprops.c b/lib/regexprops.c index b20b4a38..6794e09f 100644 --- a/lib/regexprops.c +++ b/lib/regexprops.c @@ -558,7 +558,7 @@ describe_all (const char *contextname, if (NULL == next) next = ""; begin_subsection (name, next, previous, up); - parent = get_regex_type_synonym (i); + parent = get_regex_type_synonym (i, context); if (parent >= 0) { content ("This is a synonym for "); diff --git a/lib/regextype.c b/lib/regextype.c index 89416ebd..5b9a1d51 100644 --- a/lib/regextype.c +++ b/lib/regextype.c @@ -56,19 +56,17 @@ struct tagRegexTypeMap struct tagRegexTypeMap regex_map[] = { { "findutils-default", CONTEXT_FINDUTILS, RE_SYNTAX_EMACS|RE_DOT_NEWLINE }, - + { "ed", CONTEXT_GENERIC, RE_SYNTAX_ED }, + { "emacs", CONTEXT_ALL, RE_SYNTAX_EMACS }, + { "gnu-awk", CONTEXT_ALL, RE_SYNTAX_GNU_AWK }, + { "grep", CONTEXT_ALL, RE_SYNTAX_GREP }, { "posix-awk", CONTEXT_ALL, RE_SYNTAX_POSIX_AWK }, + { "awk", CONTEXT_ALL, RE_SYNTAX_AWK }, { "posix-basic", CONTEXT_ALL, RE_SYNTAX_POSIX_BASIC }, { "posix-egrep", CONTEXT_ALL, RE_SYNTAX_POSIX_EGREP }, + { "egrep", CONTEXT_ALL, RE_SYNTAX_EGREP }, { "posix-extended", CONTEXT_ALL, RE_SYNTAX_POSIX_EXTENDED }, { "posix-minimal-basic", CONTEXT_GENERIC, RE_SYNTAX_POSIX_MINIMAL_BASIC }, - - { "awk", CONTEXT_ALL, RE_SYNTAX_AWK }, - { "ed", CONTEXT_GENERIC, RE_SYNTAX_ED }, - { "egrep", CONTEXT_ALL, RE_SYNTAX_EGREP }, - { "emacs", CONTEXT_ALL, RE_SYNTAX_EMACS }, - { "gnu-awk", CONTEXT_ALL, RE_SYNTAX_GNU_AWK }, - { "grep", CONTEXT_ALL, RE_SYNTAX_GREP }, { "sed", CONTEXT_GENERIC, RE_SYNTAX_SED }, /* ,{ "posix-common", CONTEXT_GENERIC, _RE_SYNTAX_POSIX_COMMON } */ }; @@ -140,18 +138,26 @@ unsigned int get_regex_type_context (unsigned int ix) } int -get_regex_type_synonym (unsigned int ix) +get_regex_type_synonym (unsigned int ix, unsigned int context) { unsigned i; int flags; if (ix >= N_REGEX_MAP_ENTRIES) return -1; - flags = regex_map[ix].option_val; + /* Terminate the loop before we get to IX, so that we always + consistently choose the same entry as a synonym (rather than + stating that x and y are synonyms of each other). */ for (i=0u; i<ix; ++i) { - if (flags == regex_map[i].option_val) + if ((regex_map[i].context & context) == 0) + { + /* It is pointless to state that "x is a synonym of y" if we + are not in fact going to include y. */ + continue; + } + else if (flags == regex_map[i].option_val) { return i; } diff --git a/lib/regextype.h b/lib/regextype.h index 6b44ff66..4695ff98 100644 --- a/lib/regextype.h +++ b/lib/regextype.h @@ -44,10 +44,11 @@ const char * get_regex_type_name(unsigned int ix); */ int get_regex_type_flags(unsigned int ix); -/* If regular expression type IX (which is a regular expression type index) has - * one or more synonyms, return the index of one of them. Otherwise, return -1. +/* If regular expression type IX (which is a regular expression type + * index) has one or more synonyms which is interesting in context + * CONTEXT, return the index of one of them. Otherwise, return -1. */ -int get_regex_type_synonym(unsigned int ix); +int get_regex_type_synonym(unsigned int ix, unsigned int context); /* Returns one of CONTEXT_FINDUTILS, CONTEXT_GENERIC or CONTEXT_ALL. * This identifies whether this regular expression type index is relevant for, -- 2.11.0