Issue7+TC2 0001546]: BREs: reserve \? \+ and \|

Austin Group Bug Tracker via austin-group-l at The Open Group Fri, 18 Mar 2022 02:34:26 -0700


A NOTE has been added to this issue. 
====================================================================== 
https://austingroupbugs.net/view.php?id=1546 
====================================================================== 
Reported By:                calestyo
Assigned To:                
====================================================================== 
Project:                    1003.1(2016/18)/Issue7+TC2
Issue ID:                   1546
Category:                   Base Definitions and Headers
Type:                       Enhancement Request
Severity:                   Editorial
Priority:                   normal
Status:                     Under Review
Name:                       Christoph Anton Mitterer 
Organization:                
User Reference:              
Section:                    9.3 Basic Regular Expressions 
Page Number:                N/A 
Line Number:                N/A 
Interp Status:              --- 
Final Accepted Text:        https://austingroupbugs.net/view.php?id=1546#c5738 
====================================================================== 
Date Submitted:             2022-01-08 03:48 UTC
Last Modified:              2022-03-18 09:32 UTC
====================================================================== 
Summary:                    BREs: reserve \? \+ and \|
======================================================================
Relationships       ID      Summary
----------------------------------------------------------------------
related to          0000773 Summary: Add \+, \?, and \| to Basic Re...
======================================================================


---------------------------------------------------------------------- 
 (0005755) geoffclare (manager) - 2022-03-18 09:32
 https://austingroupbugs.net/view.php?id=1546#c5755 
---------------------------------------------------------------------- 
New proposed resolution that adds the grammar changes and changes to the
anchoring description in 9.3.8. I also took the opportunity to improve
consistency between the BRE grammar and ERE grammar.

Page and line numbers are for Issue 8 draft 2.1

After page 165 line 5720 section 9.1, add:

<b>escape sequence</b><blockquote>An escape sequence, when not in a bracket
expression, is defined as the escape character <backslash> ('\\') followed
by any single character.</blockquote>
After page 167 line 5806 section 9.3.2, add:<blockquote><ul><li>The '?',
'+', and '|' characters; it is implementation-defined whether "\?", "\+",
and "\|" each match the literal character '?', '+', or '|', respectively,
or behave as described for the ERE special characters '?', '+', and '|',
respectively (see [xref to 9.4.3]).</li></ul></blockquote>
After page 168 line 5818 section 9.3.3, add a new list
item:<blockquote>Immediately following a "\|" escape sequence (after an
initial '^', if any), if the implementation does not match the escape
sequence "\|" to the literal character '|'.</blockquote>
On page 172 line 6015 section 9.3.8, change:<blockquote>A <circumflex>
('^') shall be an anchor when used as the first character of an entire BRE.
The implementation may treat the <circumflex> as an anchor when used as the
first character of a subexpression.</blockquote>to:<blockquote>A
<circumflex> ('^') shall be an anchor when used as the first character of
an entire BRE and, if the implementation does not match the escape sequence
"\|" to the literal character '|', when used immediately following a "\|"
escape sequence that is not inside a subexpression.  The implementation may
also treat a <circumflex> as an anchor when used inside a subexpression; in
this case it shall be an anchor only when either of the following is
true:<ul>
<li>It is the first character of the subexpression.</li>
<li>It immediately follows a "\|" escape sequence and the implementation
does not match the escape sequence "\|" to the literal character
'|'.</li></ul></blockquote>
On page 172 line 6023 section 9.3.8, change:<blockquote>A <dollar-sign>
('$') shall be an anchor when used as the last character of an entire BRE.
The implementation may treat a <dollar-sign> as an anchor when used as the
last character of a subexpression.</blockquote>to:<blockquote>A
<dollar-sign> ('$') shall be an anchor when used as the last character of
an entire BRE and, if the implementation does not match the escape sequence
"\|" to the literal character '|', when used immediately preceding a "\|"
escape sequence that is not inside a subexpression.  The implementation may
also treat a <dollar-sign> as an anchor when used inside a subexpression;
in this case it shall be an anchor only when either of the following is
true:<ul>
<li>It is the last character of the subexpression.</li>
<li>It immediately precedes a "\|" escape sequence and the implementation
does not match the escape sequence "\|" to the literal character
'|'.</li></ul></blockquote>
On page 176 line 6191 section 9.5.1, change:<blockquote>The character '^'
when it appears as the first character of a basic regular expression and
when not <b>QUOTED_CHAR</b>.</blockquote>to:<blockquote>The character '^'
when it appears either as the first character of a basic regular expression
or, if the implementation does not match the escape sequence "\|" to the
literal character '|', when used immediately following a "\|" escape
sequence that is not inside a subexpression, and when not
<b>QUOTED_CHAR</b>.</blockquote>
After page 176 line 6197 section 9.5.1, add:<blockquote>On implementations
where the escape sequences "\?", "\+", and "\|" match the literal
characters '?', '+', and '|', respectively, <b>QUOTED_CHAR</b> shall also
include:<pre>\?    \+    \|</pre></blockquote>
On page 177 line 6201 section 9.5.1, change:<blockquote>The character '$'
when it appears as the last character of a basic regular expression and
when not <b>QUOTED_CHAR</b>.</blockquote>to:<blockquote>The character '$'
when it appears either as the last character of a basic regular expression
or, if the implementation does not match the escape sequence "\|" to the
literal character '|', when used immediately preceding a "\|" escape
sequence that is not inside a subexpression, and when not
<b>QUOTED_CHAR</b>.</blockquote>
After page 177 line 6228 section 9.5.2, add:<blockquote><pre>/* The
following shall be tokens on implementations where
   \?, \+, and \| are not included in QUOTED_CHAR */

%token Back_qm Back_plus Back_bar
/*       \?      \+        \|     */</pre></blockquote>
On page 178 line 6244 section 9.5.2, change:<blockquote><pre>basic_reg_exp 
:          RE_expression
               | L_ANCHOR
               |                        R_ANCHOR
               | L_ANCHOR               R_ANCHOR
               | L_ANCHOR RE_expression
               |          RE_expression R_ANCHOR
               | L_ANCHOR RE_expression R_ANCHOR
               ;
RE_expression  :               simple_RE
               | RE_expression simple_RE
               ;
simple_RE      : nondupl_RE
               | nondupl_RE RE_dupl_symbol
               ;
nondupl_RE     : one_char_or_coll_elem_RE
               | Back_open_paren RE_expression Back_close_paren
               | BACKREF
               ;
one_char_or_coll_elem_RE : ORD_CHAR
               | QUOTED_CHAR
               | '.'
               | bracket_expression
               ;
RE_dupl_symbol : '*'
               | Back_open_brace DUP_COUNT               Back_close_brace
               | Back_open_brace DUP_COUNT ','           Back_close_brace
               | Back_open_brace DUP_COUNT ',' DUP_COUNT Back_close_brace
               ;</pre></blockquote>to:<blockquote><pre>basic_reg_exp   :   
                    BRE_branch
                | basic_reg_exp Back_bar BRE_branch /* if Back_bar
                                                       is a token */
                ;
BRE_branch      :            BRE_expression
                | BRE_branch BRE_expression 
                ;
BRE_expression  :          simple_BRE
                | L_ANCHOR
                |                     R_ANCHOR
                | L_ANCHOR            R_ANCHOR
                | L_ANCHOR simple_BRE
                |          simple_BRE R_ANCHOR
                | L_ANCHOR simple_BRE R_ANCHOR
                ;
simple_BRE      : nondupl_BRE
                | nondupl_BRE BRE_dupl_symbol
                ;
nondupl_BRE     : one_char_or_coll_elem_BRE
                | Back_open_paren basic_reg_exp Back_close_paren
                | BACKREF
                ;
one_char_or_coll_elem_BRE  : ORD_CHAR
                | QUOTED_CHAR
                | '.'
                | bracket_expression
                ;
BRE_dupl_symbol : '*'
                | Back_qm   /* if Back_qm is a token */
                | Back_plus /* if Back_plus is a token */
                | Back_open_brace DUP_COUNT               Back_close_brace
                | Back_open_brace DUP_COUNT ','           Back_close_brace
                | Back_open_brace DUP_COUNT ',' DUP_COUNT Back_close_brace
                ;</pre></blockquote>
On page 179 line 6313 section 9.5.2, change:<blockquote>The BRE grammar
does not permit <b>L_ANCHOR</b> or <b>R_ANCHOR</b> inside "\(" and "\)"
(which implies that '^' and '$' are ordinary characters). This reflects the
semantic limits on the application, as noted in [xref to 9.3.8].
Implementations are permitted to extend the language to interpret '^' and
'$' as anchors in these locations, and as such, conforming applications
cannot use unescaped '^' and '$' in positions inside "\(" and "\)" that
might be interpreted as anchors.</blockquote>to:<blockquote>Note that
although the BRE grammar appears always to permit <b>L_ANCHOR</b> or
<b>R_ANCHOR</b> inside "\(" and "\)", the lexical conventions (see [xref to
9.5.1]) imply that '^' and '$' may be ordinary characters there. This
reflects the semantic limits on the application, as noted in [xref to
9.3.8]. Since it is an implementation option whether to interpret '^' and
'$' as anchors in these locations, conforming applications cannot use
unescaped '^' and '$' in positions inside "\(" and "\)" that might be
interpreted as anchors.</blockquote> 

Issue History 
Date Modified    Username       Field                    Change               
====================================================================== 
2022-01-08 03:48 calestyo       New Issue                                    
2022-01-08 03:48 calestyo       Name                      => Christoph Anton
Mitterer
2022-01-08 03:48 calestyo       Section                   => 9.3 Basic Regular
Expressions
2022-01-08 03:48 calestyo       Page Number               => N/A             
2022-01-08 03:48 calestyo       Line Number               => N/A             
2022-01-28 11:21 mirabilos      Note Added: 0005636                          
2022-01-28 23:10 calestyo       Note Added: 0005639                          
2022-03-03 06:56 Don Cragun     Note Added: 0005731                          
2022-03-03 07:03 Don Cragun     Note Edited: 0005731                         
2022-03-10 17:09 geoffclare     Note Added: 0005738                          
2022-03-10 17:09 geoffclare     Interp Status             => ---             
2022-03-10 17:09 geoffclare     Final Accepted Text       =>
https://austingroupbugs.net/view.php?id=1546#c5738    
2022-03-10 17:09 geoffclare     Status                   New => Resolved     
2022-03-10 17:09 geoffclare     Resolution               Open => Accepted As
Marked
2022-03-10 17:10 geoffclare     Tag Attached: issue8                         
2022-03-10 17:44 calestyo       Note Added: 0005740                          
2022-03-12 21:01 Don Cragun     Relationship added       related to 0000773  
2022-03-12 21:12 calestyo       Note Added: 0005744                          
2022-03-12 21:13 calestyo       Note Edited: 0005744                         
2022-03-14 10:08 geoffclare     Note Added: 0005748                          
2022-03-14 10:08 geoffclare     Status                   Resolved => Under
Review
2022-03-14 10:08 geoffclare     Resolution               Accepted As Marked =>
Reopened
2022-03-14 13:50 calestyo       Note Added: 0005749                          
2022-03-14 14:31 geoffclare     Note Added: 0005750                          
2022-03-18 09:32 geoffclare     Note Added: 0005755                          
======================================================================

[1003.1(2016/18)/Issue7+TC2 0001546]: BREs: reserve \? \+ and \|

Reply via email to