A NOTE has been added to this issue. ====================================================================== https://austingroupbugs.net/view.php?id=1546 ====================================================================== Reported By: calestyo Assigned To: ====================================================================== Project: 1003.1(2016/18)/Issue7+TC2 Issue ID: 1546 Category: Base Definitions and Headers Type: Enhancement Request Severity: Editorial Priority: normal Status: Under Review Name: Christoph Anton Mitterer Organization: User Reference: Section: 9.3 Basic Regular Expressions Page Number: N/A Line Number: N/A Interp Status: --- Final Accepted Text: https://austingroupbugs.net/view.php?id=1546#c5738 ====================================================================== Date Submitted: 2022-01-08 03:48 UTC Last Modified: 2022-03-18 09:32 UTC ====================================================================== Summary: BREs: reserve \? \+ and \| ====================================================================== Relationships ID Summary ---------------------------------------------------------------------- related to 0000773 Summary: Add \+, \?, and \| to Basic Re... ======================================================================
---------------------------------------------------------------------- (0005755) geoffclare (manager) - 2022-03-18 09:32 https://austingroupbugs.net/view.php?id=1546#c5755 ---------------------------------------------------------------------- New proposed resolution that adds the grammar changes and changes to the anchoring description in 9.3.8. I also took the opportunity to improve consistency between the BRE grammar and ERE grammar. Page and line numbers are for Issue 8 draft 2.1 After page 165 line 5720 section 9.1, add: <b>escape sequence</b><blockquote>An escape sequence, when not in a bracket expression, is defined as the escape character <backslash> ('\\') followed by any single character.</blockquote> After page 167 line 5806 section 9.3.2, add:<blockquote><ul><li>The '?', '+', and '|' characters; it is implementation-defined whether "\?", "\+", and "\|" each match the literal character '?', '+', or '|', respectively, or behave as described for the ERE special characters '?', '+', and '|', respectively (see [xref to 9.4.3]).</li></ul></blockquote> After page 168 line 5818 section 9.3.3, add a new list item:<blockquote>Immediately following a "\|" escape sequence (after an initial '^', if any), if the implementation does not match the escape sequence "\|" to the literal character '|'.</blockquote> On page 172 line 6015 section 9.3.8, change:<blockquote>A <circumflex> ('^') shall be an anchor when used as the first character of an entire BRE. The implementation may treat the <circumflex> as an anchor when used as the first character of a subexpression.</blockquote>to:<blockquote>A <circumflex> ('^') shall be an anchor when used as the first character of an entire BRE and, if the implementation does not match the escape sequence "\|" to the literal character '|', when used immediately following a "\|" escape sequence that is not inside a subexpression. The implementation may also treat a <circumflex> as an anchor when used inside a subexpression; in this case it shall be an anchor only when either of the following is true:<ul> <li>It is the first character of the subexpression.</li> <li>It immediately follows a "\|" escape sequence and the implementation does not match the escape sequence "\|" to the literal character '|'.</li></ul></blockquote> On page 172 line 6023 section 9.3.8, change:<blockquote>A <dollar-sign> ('$') shall be an anchor when used as the last character of an entire BRE. The implementation may treat a <dollar-sign> as an anchor when used as the last character of a subexpression.</blockquote>to:<blockquote>A <dollar-sign> ('$') shall be an anchor when used as the last character of an entire BRE and, if the implementation does not match the escape sequence "\|" to the literal character '|', when used immediately preceding a "\|" escape sequence that is not inside a subexpression. The implementation may also treat a <dollar-sign> as an anchor when used inside a subexpression; in this case it shall be an anchor only when either of the following is true:<ul> <li>It is the last character of the subexpression.</li> <li>It immediately precedes a "\|" escape sequence and the implementation does not match the escape sequence "\|" to the literal character '|'.</li></ul></blockquote> On page 176 line 6191 section 9.5.1, change:<blockquote>The character '^' when it appears as the first character of a basic regular expression and when not <b>QUOTED_CHAR</b>.</blockquote>to:<blockquote>The character '^' when it appears either as the first character of a basic regular expression or, if the implementation does not match the escape sequence "\|" to the literal character '|', when used immediately following a "\|" escape sequence that is not inside a subexpression, and when not <b>QUOTED_CHAR</b>.</blockquote> After page 176 line 6197 section 9.5.1, add:<blockquote>On implementations where the escape sequences "\?", "\+", and "\|" match the literal characters '?', '+', and '|', respectively, <b>QUOTED_CHAR</b> shall also include:<pre>\? \+ \|</pre></blockquote> On page 177 line 6201 section 9.5.1, change:<blockquote>The character '$' when it appears as the last character of a basic regular expression and when not <b>QUOTED_CHAR</b>.</blockquote>to:<blockquote>The character '$' when it appears either as the last character of a basic regular expression or, if the implementation does not match the escape sequence "\|" to the literal character '|', when used immediately preceding a "\|" escape sequence that is not inside a subexpression, and when not <b>QUOTED_CHAR</b>.</blockquote> After page 177 line 6228 section 9.5.2, add:<blockquote><pre>/* The following shall be tokens on implementations where \?, \+, and \| are not included in QUOTED_CHAR */ %token Back_qm Back_plus Back_bar /* \? \+ \| */</pre></blockquote> On page 178 line 6244 section 9.5.2, change:<blockquote><pre>basic_reg_exp : RE_expression | L_ANCHOR | R_ANCHOR | L_ANCHOR R_ANCHOR | L_ANCHOR RE_expression | RE_expression R_ANCHOR | L_ANCHOR RE_expression R_ANCHOR ; RE_expression : simple_RE | RE_expression simple_RE ; simple_RE : nondupl_RE | nondupl_RE RE_dupl_symbol ; nondupl_RE : one_char_or_coll_elem_RE | Back_open_paren RE_expression Back_close_paren | BACKREF ; one_char_or_coll_elem_RE : ORD_CHAR | QUOTED_CHAR | '.' | bracket_expression ; RE_dupl_symbol : '*' | Back_open_brace DUP_COUNT Back_close_brace | Back_open_brace DUP_COUNT ',' Back_close_brace | Back_open_brace DUP_COUNT ',' DUP_COUNT Back_close_brace ;</pre></blockquote>to:<blockquote><pre>basic_reg_exp : BRE_branch | basic_reg_exp Back_bar BRE_branch /* if Back_bar is a token */ ; BRE_branch : BRE_expression | BRE_branch BRE_expression ; BRE_expression : simple_BRE | L_ANCHOR | R_ANCHOR | L_ANCHOR R_ANCHOR | L_ANCHOR simple_BRE | simple_BRE R_ANCHOR | L_ANCHOR simple_BRE R_ANCHOR ; simple_BRE : nondupl_BRE | nondupl_BRE BRE_dupl_symbol ; nondupl_BRE : one_char_or_coll_elem_BRE | Back_open_paren basic_reg_exp Back_close_paren | BACKREF ; one_char_or_coll_elem_BRE : ORD_CHAR | QUOTED_CHAR | '.' | bracket_expression ; BRE_dupl_symbol : '*' | Back_qm /* if Back_qm is a token */ | Back_plus /* if Back_plus is a token */ | Back_open_brace DUP_COUNT Back_close_brace | Back_open_brace DUP_COUNT ',' Back_close_brace | Back_open_brace DUP_COUNT ',' DUP_COUNT Back_close_brace ;</pre></blockquote> On page 179 line 6313 section 9.5.2, change:<blockquote>The BRE grammar does not permit <b>L_ANCHOR</b> or <b>R_ANCHOR</b> inside "\(" and "\)" (which implies that '^' and '$' are ordinary characters). This reflects the semantic limits on the application, as noted in [xref to 9.3.8]. Implementations are permitted to extend the language to interpret '^' and '$' as anchors in these locations, and as such, conforming applications cannot use unescaped '^' and '$' in positions inside "\(" and "\)" that might be interpreted as anchors.</blockquote>to:<blockquote>Note that although the BRE grammar appears always to permit <b>L_ANCHOR</b> or <b>R_ANCHOR</b> inside "\(" and "\)", the lexical conventions (see [xref to 9.5.1]) imply that '^' and '$' may be ordinary characters there. This reflects the semantic limits on the application, as noted in [xref to 9.3.8]. Since it is an implementation option whether to interpret '^' and '$' as anchors in these locations, conforming applications cannot use unescaped '^' and '$' in positions inside "\(" and "\)" that might be interpreted as anchors.</blockquote> Issue History Date Modified Username Field Change ====================================================================== 2022-01-08 03:48 calestyo New Issue 2022-01-08 03:48 calestyo Name => Christoph Anton Mitterer 2022-01-08 03:48 calestyo Section => 9.3 Basic Regular Expressions 2022-01-08 03:48 calestyo Page Number => N/A 2022-01-08 03:48 calestyo Line Number => N/A 2022-01-28 11:21 mirabilos Note Added: 0005636 2022-01-28 23:10 calestyo Note Added: 0005639 2022-03-03 06:56 Don Cragun Note Added: 0005731 2022-03-03 07:03 Don Cragun Note Edited: 0005731 2022-03-10 17:09 geoffclare Note Added: 0005738 2022-03-10 17:09 geoffclare Interp Status => --- 2022-03-10 17:09 geoffclare Final Accepted Text => https://austingroupbugs.net/view.php?id=1546#c5738 2022-03-10 17:09 geoffclare Status New => Resolved 2022-03-10 17:09 geoffclare Resolution Open => Accepted As Marked 2022-03-10 17:10 geoffclare Tag Attached: issue8 2022-03-10 17:44 calestyo Note Added: 0005740 2022-03-12 21:01 Don Cragun Relationship added related to 0000773 2022-03-12 21:12 calestyo Note Added: 0005744 2022-03-12 21:13 calestyo Note Edited: 0005744 2022-03-14 10:08 geoffclare Note Added: 0005748 2022-03-14 10:08 geoffclare Status Resolved => Under Review 2022-03-14 10:08 geoffclare Resolution Accepted As Marked => Reopened 2022-03-14 13:50 calestyo Note Added: 0005749 2022-03-14 14:31 geoffclare Note Added: 0005750 2022-03-18 09:32 geoffclare Note Added: 0005755 ======================================================================