A NOTE has been added to this issue. ====================================================================== https://austingroupbugs.net/view.php?id=249 ====================================================================== Reported By: dwheeler Assigned To: ajosey ====================================================================== Project: 1003.1(2008)/Issue 7 Issue ID: 249 Category: Shell and Utilities Type: Enhancement Request Severity: Objection Priority: normal Status: Under Review Name: David A. Wheeler Organization: User Reference: Section: 2.2 Quoting Page Number: 2298-2299 Line Number: 72348-72401 Interp Status: --- Final Accepted Text: Previously accepted text is in https://austingroupbugs.net/view.php?id=249#c590. ====================================================================== Date Submitted: 2010-04-30 21:42 UTC Last Modified: 2022-10-20 15:12 UTC ====================================================================== Summary: Add standard support for $'...' in shell ====================================================================== Relationships ID Summary ---------------------------------------------------------------------- parent of 0001413 incorrect description of how a hexadeci... related to 0000322 Defect in XCU File Format Notation related to 0000985 quote removal missing from case stateme... ======================================================================
---------------------------------------------------------------------- (0006006) geoffclare (manager) - 2022-10-20 15:12 https://austingroupbugs.net/view.php?id=249#c6006 ---------------------------------------------------------------------- These are the agreed changes from https://posix.rhansen.org/p/bug249 (omitting \uXXXX and \UXXXXXXXX). Page and line numbers are for the 2013 edition (C138.pdf) At page 2319 line 73573 (XCU section 2.1, Shell Introduction, item 4) change:<blockquote>The shell performs various expansions (separately) on different parts of each command, resulting in a list of pathnames and fields to be treated as a command and arguments; see [xref to 2.6].x</blockquote>to:<blockquote>For each word within a command, the shell processes backslash escape sequences inside dollar-single-quotes (see [xref to 2.2.4]) and then performs various word expansions (see [xref to 2.6]). In the case of a simple command, the results usually include a list of pathnames and fields to be treated as a command name and arguments; see [xref to 2.9].</blockquote> At page 2320 line 73594 (XCU section 2.2, Quoting) change:<blockquote>The various quoting mechanisms are the escape character, single-quotes, and double-quotes.</blockquote>to:<blockquote>The various quoting mechanisms are the escape character, single-quotes, double-quotes, and dollar-single-quotes.</blockquote> At page 2320 lines 73609-73611 (XCU 2.2.3, Double-Quotes), change:<blockquote>$ The <dollar-sign> shall retain its special meaning introducing parameter expansion (see Section 2.6.2), a form of command substitution (see Section 2.6.3), and arithmetic expansion (see Section 2.6.4).</blockquote>to:<blockquote>$ The <dollar-sign> shall retain its special meaning introducing parameter expansion (see Section 2.6.2), a form of command substitution (see Section 2.6.3), and arithmetic expansion (see Section 2.6.4), but shall not retain its special meaning introducing the dollar-single-quotes form of quoting (see [xref to 2.2.4]).</blockquote> At page 2321 lines 73626-73627 (XCU 2.2.3, Double-Quotes), change:<blockquote><ul><li>A single-quoted or double-quoted string that begins, but does not end, within the "`...`" sequence</li></ul></blockquote>to:<blockquote><ul><li>A quoted (single-quoted, double-quoted, or dollar-single-quoted) string that begins, but does not end, within the "`...`" sequence</li></ul></blockquote> After page 2321 line 73635 (end of XCU section 2.2), insert a new subsection:<blockquote><b>2.2.4 Dollar-Single-Quotes</b> A sequence of characters starting with a <dollar-sign> immediately followed by a single-quote (<tt>$'</tt>) shall preserve the literal value of all characters up to an unescaped terminating single-quote (<tt>'</tt>), with the exception of certain backslash escape sequences, as follows:<ul> <li><tt>\"</tt> yields a <quotation-mark> (double-quote) character, but note that <quotation-mark> can be included unescaped.</li> <li><tt>\'</tt> yields an <apostrophe> (single-quote) character.</li> <li><tt>\\</tt> yields a <backslash> character.</li> <li><tt>\a</tt> yields an <alert> character.</li> <li><tt>\b</tt> yields a <backspace> character.</li> <li><tt>\e</tt> yields an <ESC> character.</li> <li><tt>\f</tt> yields a <form-feed> character.</li> <li><tt>\n</tt> yields a <newline> character.</li> <li><tt>\r</tt> yields a <carriage-return> character.</li> <li><tt>\t</tt> yields a <tab> character.</li> <li><tt>\v</tt> yields a <vertical-tab> character.</li> <li><tt>\cX</tt> yields the control character listed in the Value column of [xref to XCU Table 4.21] in the Operands section of the stty utility when X is one of the characters listed in the ^c column of the same table, except that \c\\ yields the <FS> control character since the <backslash> character must be escaped.</li> <li><tt>\xXX</tt> yields the byte whose value is the hexadecimal value XX (one or more hex digits). If more than two hex digits follow \x, the results are unspecified.</li> <li><tt>\ddd</tt> yields the byte whose value is the octal value <i>ddd</i> (one to three octal digits).</li> <li>The behavior of a <backslash> immediately followed by any other character, including <newline>, is unspecified.</li></ul> In cases where a variable number of characters can be used to specify an escape sequence (\xXX and \ddd), the escape sequence shall be terminated by the first character that is not of the expected type or, for \ddd sequences, when the maximum number of characters specifed has been found, whichever occurs first. These backslash escape sequences shall be processed (replaced with the bytes or characters they yield) immediately prior to word expansion (see [xref to 2.6]) of the word in which the dollar-single-quotes sequence occurs. If a \xXX or \ddd escape sequence yields a byte whose value is 0, it is unspecified whether that null byte is included in the result or if that byte and any following regular characters and escape sequences up to the terminating unescaped single-quote are evaluated and discarded. If the octal value specified by \ddd will not fit in a byte, the results are unspecified. If a \e or \cX escape sequence specifies a character that does not have an encoding in the locale in effect when these backslash escape sequences are processed, the result is implementation-defined. However, implementations shall not replace an unsupported character with bytes that do not form valid characters in that locale's character set. If a backslash escape sequence represents a single-quote character (for example \'), that sequence shall not terminate the dollar-single-quote sequence.</blockquote> At page 2321 lines 73658-73664 (XCU section 2.3 (Token Recognition) point 4), change:<blockquote>4. If the current character is <backslash>, single-quote, or double-quote and it is not quoted, it shall affect quoting for subsequent characters up to the end of the quoted text. The rules for quoting are as described in Section 2.2 (on page 2298). During token recognition no substitutions shall be actually performed, and the result token shall contain exactly the characters that appear in the input (except for <newline> joining), unmodified, including any embedded or enclosing quotes or substitution operators, between the <quotationmark> and the end of the quoted text. The token shall not be delimited by the end of the quoted field.</blockquote>to:<blockquote>4. If the current character is an unquoted <backslash>, single-quote, or double-quote or is the first character of an unquoted <dollar-sign> single-quote sequence, it shall affect quoting for subsequent characters up to the end of the quoted text. The rules for quoting are as described in [xref to Section 2.2]. During token recognition no substitutions shall be actually performed, and the result token shall contain exactly the characters that appear in the input unmodified, including any embedded or enclosing quotes or substitution operators, between the start and the end of the quoted text. The token shall not be delimited by the end of the quoted field.</blockquote> After page 2327 line 73900 (XCU section 2.6, Word Expansions), insert a new bullet point:<blockquote><ul><li>a <single-quote></li></ul></blockquote> At page 2331 lines 74071-74073 (XCU 2.6.3, Command Substitution), change:<blockquote>A single-quoted or double-quoted string that begins, but does not end, within the <tt>"`...`"</tt> sequence produces undefined results.</blockquote>to:<blockquote>A quoted string that begins, but does not end, within the <tt>"`...`"</tt> sequence produces undefined results.</blockquote> At page 2333 lines 74157-74158 (XCU section 2.6.7, Quote Removal), change:<blockquote>The quote characters (<backslash>, single-quote, and double-quote) that were present in the original word shall be removed unless they have themselves been quoted.</blockquote>to:<blockquote>The quote character sequence <dollar-sign> single-quote and the single-character quote characters (<backslash>, single-quote, and double-quote) that were present in the original word shall be removed unless they have themselves been quoted. Note that the single-quote character that terminates a <dollar-sign> single-quote sequence is itself a single-character quote character. Note that after quote removal the shell still remembers which characters were quoted. This is necessary for purposes such as matching patterns in a <b>case</b> conditional construct (see [xref to 2.9.4.3] and [xref to 2.13]).</blockquote> At page 2348 lines 74718-74719 (the Note in XCU section 2.10.2 (Shell Grammar Rules) rule 1), change:<blockquote>Because at this point <quotation-mark> characters are retained in the token, quoted strings cannot be recognized as reserved words.</blockquote>to:<blockquote>Because at this point quoting characters (<backslash>, single-quote, <quotation-mark>, and the <dollar-sign> single-quote sequence) are retained in the token, quoted strings cannot be recognized as reserved words.</blockquote> After page 3677 line 125685 (end of XRAT C.2.2.3), insert a new paragraph:<blockquote>The $'...' construct does not retain its special meaning inside double quotes. This was discussed by the standard developers and rejected. Note that $'...' is a quoting mechanism and not an expansion. Losing the special meaning inside double quotes is consistent with other quoting mechanisms losing their special meaning when quoted.</blockquote> After the above insertion and before page 3678 line 125686 (XRAT C.2.3), insert a new subsection:<blockquote><i>C.2.2.4 Dollar-Single-Quotes</i> The $'...' quoting construct has been implemented in several recent shells. It is similar to character string literals ("...") in the ISO C standard with the following exceptions:<ul> <li>The \x escape sequence in C can be followed by an arbitrary number of hexadecimal digits. The ksh93 implementation of $'...' also consumes an arbitrary number of hexadecimal digits; bash consumes at most two hexadecimal digits in this case. This standard leaves the result unspecified if more than two hexadecimal digits follow \x. (Note that a hexadecimal escape followed by a literal hexadecimal character can always be represented as $'\xXX'X.)</li> <li>The \c escape sequence is not included in the ISO C standard. There was also some disagreement in shells that historically supported \c escape sequences in $'...'. These include:<ul> <li>whether \cA through \cZ produced the byte values 1 through 26, respectively or supported the codeset independent control character as specified by the <i>stty</i> utility. This standard requires codeset independence.</li> <li>whether \c[, \c\\, \c], \c^, \c_, and \c? could be used to yield the <ESC>, <FS>, <GS>, <RS>, <US>, and <DEL> control characters, respectively. This standard requires support for all of the control characters except NULL (matching what is done in the <i>stty</i> utility).</li> <li>whether \c\\ or \c\ was used to represent <FS>. This standard requires \c\\ to make backslash escape processing consistent.</li></ul> The implementors of the most common shells that implement $'\cX' agreed to convert to the behavior specified in this standard. Some shells also allow \c<arbitrary_control_character> to act as an inverse function to \cX (i.e., \cm and \cM yield <CR> and \c<CR> yields m or M. This standard leaves this behavior implementation-defined.</li> <li>The \e escape sequence is not included in the ISO C standard, but was provided by all historical shells that supported $'...'. Some also supported \E as a synonym. One member of the group objected to adding \e because the <ESC> control character is not required to be in the portable character set. The \e sequence is included because many historical users of $'...' expect it to be there. The \E sequence is not included in this standard because <backslash> escape sequences that start with <backslash> followed by an uppercase letter (except \U) are reserved by the C Standard for implementation use.</li> <li>The \ddd octal escape sequence and the \xXX hexadecimal escape sequence can be used to insert a null byte into a C Standard character string literal and into a $'...' quoted word in this standard. In C, any characters specified after that null byte (including escape sequences) continue to be processed and added to the character string literal. In $'...' in the shell this standard allows the equivalent behavior but also allows the null byte and all remaining characters up to the terminating unescaped single-quote be evaluated and discarded. The latter (which was historic practice in bash, but not in ksh93) allows an escape sequence producing a null byte to terminate the dollar-single-quoted expansion, but not terminate the token in which it appears if there are characters remaining in the token. For example:<pre>printf a$'b\0c\''d</pre>is required by this standard to produce:<pre>abd</pre>while historic versions of ksh93 produced:<pre>ab</pre></li> <li>The ISO C standard specifies \uXXXX and \UXXXXXXXX escape sequences. These need not be supported by $'...' in the shell. They were omitted because current shell implementations that support them differ in behavior. In particular, some shells always convert them to the UTF-8 encoding for the named character, even if the current locale's character set does not have UTF-8 encoding.</li> <li>The double-quote (") character can be used literally, while the single-quote (') character must be represented as an escape sequence. In C, single-quote can be used literally, while double-quote requires an escape sequence.</li> <li>A <backslash> immediately followed by a <newline> has unspecified behavior. In C, this sequence is used for line continuations, where both the <backslash> and <newline> are deleted and a diagnostic is required if a closing quote is not encountered before a <newline> that is not preceded by <backslash>. In current shell implementations, three different behaviors have been observed.</li> <li>Backslash escape sequences not described in the standard result in unspecified behavior. In C, the result is not a token and a diagnostic is required. This allows shells to recognize other backslash escape sequences in other ways as extensions to the standard's requirements. Furthermore, existing implementations already had different behaviors for some backslash escape sequences when $'...' processing was added to the standard.</li></ul> This standard makes the results implementation-defined if \e or \cX specifies a character that is not present in the current locale. Application authors should note that implementations are permitted to have a wide range of behaviors when encountering an unsupported character. For example:<ul> <li>the shell might produce an error, possibly causing the shell to terminate</li> <li>the unsupported character might be silently discarded</li> <li>the unsupported character might be replaced with another character of a different character class</li> <li>the unsupported character might be replaced with a shell-special character (e.g., '?')</li> <li>the unsupported character might be replaced with multiple characters, shell-special or regular (e.g. if <ESC> is not supported $'\e' may be replaced by "???", "XXX" or "<ESC>")</li></ul> However, implementations must document their behavior, and they are prohibited from replacing an unsupported character with bytes that do not form valid characters in the current locale's character set (e.g., encoding in UTF-8 when the locale has a 7-bit character set). This standard does not specify a way for script authors to determine beforehand whether a particular \cX sequence specifies a character that exists in the current locale. At the time this feature was standardized, no known implementations provided such a capability. Note that the escape sequences recognized by $'...', file format notation (see [xref to Table 5-1]), XSI-conforming implementations of the <i>echo</i> utility (see the utility's operands section on [xref to echo]), and the <i>printf</i> utility's format operand (see the utility's extended description on [xref to printf]) are not the same. Some escape sequences are not recognized by all of the above, the \c escape sequence in <i>echo</i> is not at all like the \c escape sequence in $'...', octal escape sequences in some of the above accept one to four octal digits and require a leading zero while others accept one to three octal digits and do not require a leading zero.</blockquote> Issue History Date Modified Username Field Change ====================================================================== 2010-04-30 21:42 dwheeler New Issue 2010-04-30 21:42 dwheeler Status New => Under Review 2010-04-30 21:42 dwheeler Assigned To => ajosey 2010-04-30 21:42 dwheeler Name => David A. Wheeler 2010-04-30 21:42 dwheeler Section => 2.2 Quoting 2010-04-30 21:42 dwheeler Page Number => 2298-2299 2010-04-30 21:42 dwheeler Line Number => 72348-72401 2010-09-16 16:17 nick Note Added: 0000548 2010-09-18 18:12 Don Cragun Relationship added related to 0000322 2010-10-01 12:48 geoffclare Note Added: 0000560 2010-10-06 01:26 nick Tag Attached: c99 2010-10-08 17:28 mirabilos Note Added: 0000565 2010-10-25 06:17 Don Cragun Note Added: 0000590 2010-10-25 14:51 Don Cragun Note Edited: 0000590 2010-10-25 15:55 Don Cragun Note Edited: 0000590 2010-10-26 06:44 Don Cragun Note Edited: 0000590 2010-10-26 20:39 Don Cragun Note Edited: 0000590 2010-10-26 20:40 Don Cragun Note Edited: 0000590 2010-10-26 20:40 Don Cragun Note Edited: 0000590 2010-10-26 20:45 Don Cragun Note Edited: 0000590 2010-10-26 21:04 Don Cragun Note Edited: 0000590 2010-10-27 03:29 Don Cragun Note Edited: 0000590 2010-11-04 16:07 nick Note Added: 0000599 2010-11-05 02:34 Don Cragun Note Edited: 0000590 2010-11-05 03:00 Don Cragun Note Added: 0000601 2010-11-05 03:04 Don Cragun Note Edited: 0000601 2010-11-05 14:52 nick Note Added: 0000609 2010-11-05 14:54 nick File Added: n1534.htm 2010-11-05 14:56 nick File Added: n1534_original.htm 2010-11-11 16:40 Don Cragun Note Edited: 0000590 2010-11-11 16:42 Don Cragun Note Edited: 0000590 2010-11-11 16:44 Don Cragun Interp Status => --- 2010-11-11 16:44 Don Cragun Final Accepted Text => See https://austingroupbugs.net/view.php?id=249#c590 2010-11-11 16:44 Don Cragun Status Under Review => Resolved 2010-11-11 16:44 Don Cragun Resolution Open => Accepted As Marked 2010-11-11 16:44 Don Cragun Tag Attached: issue8 2010-12-09 16:12 Don Cragun Note Edited: 0000590 2015-07-31 15:59 stephane Issue Monitored: stephane 2015-08-20 15:24 nick Note Edited: 0000590 2015-08-20 15:25 nick Note Edited: 0000590 2015-08-20 15:31 nick Note Added: 0002793 2015-08-20 15:32 nick Note Edited: 0002793 2015-08-20 15:33 nick Note Edited: 0002793 2015-09-03 16:40 rhansen Note Added: 0002809 2015-09-03 16:45 rhansen Note Edited: 0002809 2015-09-03 16:48 rhansen Note Edited: 0002809 2015-09-03 17:00 rhansen Note Edited: 0002809 2015-09-03 17:05 rhansen Note Edited: 0002809 2015-09-03 17:10 rhansen Tag Attached: UTF-8_Locale 2015-09-10 20:25 rhansen Note Added: 0002824 2015-09-10 20:26 rhansen Note Edited: 0002824 2015-09-10 20:26 rhansen Note Edited: 0002824 2015-09-10 20:27 rhansen Note Edited: 0002809 2015-10-08 06:43 Don Cragun Final Accepted Text See https://austingroupbugs.net/view.php?id=249#c590 => Previously accepted text is in https://austingroupbugs.net/view.php?id=249#c590. 2015-10-08 06:43 Don Cragun Resolution Accepted As Marked => Reopened 2015-10-08 16:32 Don Cragun Status Resolved => Under Review 2015-10-08 16:41 rhansen Relationship added related to 0000985 2015-11-09 21:10 steffen Note Added: 0002893 2015-11-13 19:19 shware_systems Note Added: 0002922 2015-11-13 21:22 steffen Note Added: 0002923 2015-11-13 21:24 steffen Note Added: 0002925 2015-11-13 23:23 shware_systems Note Added: 0002929 2015-11-14 14:26 steffen Note Added: 0002942 2016-06-06 21:07 steffen Note Added: 0003247 2016-06-07 19:50 shware_systems Note Added: 0003248 2016-06-07 19:53 shware_systems Note Edited: 0003248 2016-06-07 20:25 steffen Note Added: 0003249 2016-06-07 20:27 steffen Note Added: 0003250 2016-06-08 06:51 shware_systems Note Added: 0003251 2016-06-08 10:43 steffen Note Added: 0003252 2016-06-08 19:55 shware_systems Note Added: 0003254 2016-06-09 13:55 steffen Note Added: 0003256 2016-06-09 14:10 stephane Note Added: 0003257 2016-06-09 14:15 shware_systems Note Added: 0003258 2016-06-09 14:18 stephane Note Edited: 0003257 2016-06-09 14:26 stephane Note Edited: 0003257 2016-06-09 15:07 steffen Note Added: 0003259 2016-06-09 18:42 stephane Note Added: 0003262 2021-01-31 05:37 calestyo Issue Monitored: calestyo 2021-01-31 05:38 calestyo Note Added: 0005221 2021-01-31 06:48 stephane Note Added: 0005222 2021-01-31 17:08 calestyo Note Added: 0005223 2021-01-31 18:31 stephane Note Edited: 0005222 2021-01-31 18:34 stephane Note Edited: 0005222 2021-02-04 21:38 dwheeler Note Added: 0005226 2021-02-05 16:02 dwheeler Note Added: 0005227 2021-02-05 16:08 dwheeler Note Added: 0005228 2021-02-05 16:42 calestyo Note Added: 0005229 2021-02-18 17:11 nick Relationship added parent of 0001413 2021-03-15 20:38 mirabilos Note Added: 0005273 2021-03-16 09:40 geoffclare Note Added: 0005275 2022-01-14 20:20 calestyo Issue End Monitor: calestyo 2022-01-14 20:20 calestyo Issue Monitored: calestyo 2022-02-25 02:56 calestyo Note Added: 0005714 2022-03-14 00:52 calestyo Note Added: 0005746 2022-03-14 01:22 calestyo Note Added: 0005747 2022-03-14 21:22 shware_systems Note Added: 0005751 2022-10-18 10:42 geoffclare Note Added: 0005995 2022-10-18 10:50 geoffclare Note Edited: 0005995 2022-10-19 11:50 kre Note Added: 0005997 2022-10-19 11:52 kre Note Edited: 0005997 2022-10-19 11:54 kre Note Edited: 0005997 2022-10-19 11:56 kre Note Edited: 0005997 2022-10-19 12:22 hvd Note Added: 0005998 2022-10-19 13:32 hvd Note Edited: 0005998 2022-10-19 14:04 kre Note Added: 0005999 2022-10-19 20:53 steffen Note Added: 0006000 2022-10-19 21:29 steffen Note Added: 0006001 2022-10-20 08:49 geoffclare Note Added: 0006002 2022-10-20 08:49 geoffclare Note Edited: 0006002 2022-10-20 09:03 geoffclare Note Added: 0006003 2022-10-20 10:08 geoffclare Note Added: 0006004 2022-10-20 15:12 geoffclare Note Added: 0006006 ======================================================================