Several questions about $'...'

Harald van Dijk Fri, 28 Sep 2018 05:54:02 -0700

Hello,

Recently I started implementing support for $'...' strings based on theproposed text in http://austingroupbugs.net/view.php?id=249 and latercomments in my shell (dash-based personal hobby project), which has leftme with a few questions. I understand that the text is not final andthat any answers I receive now may be invalidated by future changes tothe text.

1. Are the rules for determining the end of a dollar-quoted stringintended to be fully specified, especially when taking into accountstrings that are never expanded?


   : || : $'\c'   #1
   : || : $'\c\'' #2

? Unlike other shells, mksh takes the second ' in #1 as the operand to\c, and the backslash by itself in #2, so complains about a missingclosing quote for both. Is any particular behaviour intended?

2. I am not able to fully understand the allowed unspecified effects ofnull byte handling.

If a \xXX or \XXX escape sequence yields a byte whose value is 0,
it is unspecified whether that nul byte is included in the result
or if that byte and any following regular characters and escape
sequences up to the terminating unescaped single-quote are evaluated
and discarded.

a. As mentioned in a comment already, \u0000 can be used to produce anull byte as well, which is missing from the text, but is this alsointended to apply to implementation-defined or unspecified forms ofproducing a null byte? Examples are \400, which produces a null byte onmost implementations, but does not exhibit the same behaviour as \0 inmksh in how strings are terminated, and \c@.b. If the implementation includes that null byte in the result, howdoes it follow that

                        printf a$'b\0c\''d
                    is required by this standard to produce:
                        abd
                    while historic versions of ksh93 produced:
                        ab

If the null byte is included in the string, would not passing thecomplete string including null byte to the printf utility cause theoutput to be "ab"?

3. About the addition "during token recognition" for handling of \uXXXXand \UXXXXXXXX: when exactly does token recognition take place? Doesthis require, allow, or forbid implementations from parsing multiplecommands prior to executing them, if one command may change the locale,but the $'...' string is part of a different later command? Does thisrequire functions containing $'\uXXXX' to expand them according to thelocale that was in effect when the function was defined, or are shellsallowed to (re-)tokenise the function's commands when the function isexecuted or behave as if such re-tokenisation is taking place?

4. Does the implementation-defined aspect of handling of \uXXXX and\UXXXXXXXX "during token recognition" that are not part of the currentlocale's character set allow implementations to issue an error messageat parse time? That is, assuming a shell decides to provide an errormessage for


  LC_ALL=C
  : $'\u1234'

as zsh does, does this allow and/or require that same error message tobe produced for


  LC_ALL=C
  : || : $'\u1234'

Cheers,
Harald van Dijk

Several questions about $'...'

Reply via email to