A NOTE has been added to this issue. ====================================================================== http://austingroupbugs.net/view.php?id=1084 ====================================================================== Reported By: Mark_Galeck Assigned To: ====================================================================== Project: 1003.1(2016)/Issue7+TC2 Issue ID: 1084 Category: Shell and Utilities Type: Error Severity: Editorial Priority: normal Status: New Name: Mark Galeck Organization: User Reference: Section: 2.3 Token Recognition Page Number: 2347-2348 Line Number: 74761-74780 Interp Status: --- Final Accepted Text: ====================================================================== Date Submitted: 2016-10-12 08:56 UTC Last Modified: 2016-10-25 13:04 UTC ====================================================================== Summary: rule 3, 4, 5 do not say that a token is started, if needed ======================================================================
---------------------------------------------------------------------- (0003456) shware_systems (reporter) - 2016-10-25 13:04 http://austingroupbugs.net/view.php?id=1084#c3456 ---------------------------------------------------------------------- I am explaining how the standard can be interpreted so what's there does match existing behaviors and not arguing, per your point 1. as something questionable just looking at the normative text. I'm more right than not here, whether that's easy to believe or not, as someone that has been involved in the more recent changes to that text. I leave it as my opinion because it's easy enough to overlook nuances intended by the original authors of those sections, some of whom (if not all) are still involved with the list, so I do not speak for them. Note the 'same current char' and 'may terminate preceding tokens' clauses I used. The basic loop isn't: while (*curchar++!=EOL) {apply a single rule}; it's more a: if (not empty input) do {apply rules, maybe recursively, and at EOL applying the grammar to see if maybe io_here bodies need to be processed, and then possibly reapplying rules due to detected alias expansions, again potentially with recursion} until (*curchar==EOI); one. It is the individual rules that say when curchar++ may be executed, possibly as part of a sub-loop specific to that rule or to do look-ahead checking, such as for on-the-fly line joining when *curchar=='\'. The standard leaves open *curchar may be referencing an input buffer where line joining has been preprocessed as much as practical as well. Rule 10 does not say use the current char as first character of a new token and access a new char as current char, just to use it as first char. Rule 3 does apply to terminate, per above, the '>' token of '>foobar', but it does not necessarily start a new token. Rule 3 applies for '>#foobar' also, as terminating '>', but Rule 9 is what determines how the rest of that line is classified, with '#' as the current char, as beginning a comment. The delimiting newline Rule 9 says to look for and move past can also be overridden by Rule 1 if the comment is on the last line of a file that isn't terminated by an EOL. It also applies if the last character is a NUL, if the source is a C string as a sh -c argument or system() interface call, or a Ctrl-Z if that's the interactive EOF control character, as variants all symbolically EOI. For 'foo'bar'baz' the first ' gets classified by Rule 10, and then by Rule 4 as the beginning of a single quoted string. A new current char, the 'f', is then accessed according to that rule. For the last case, Rule 10 starts the token, $$ is a valid special parameter (the shells' numeric pid after evaluation) by Rule 5 and 2.5.2, and '#' by Rule 10 followed by Rule 9 again begins a comment. No, it's not straightforward, but it is essentially correct as is in describing how various implementations process most scripts. Issue History Date Modified Username Field Change ====================================================================== 2016-10-12 08:56 Mark_Galeck New Issue 2016-10-12 08:56 Mark_Galeck Name => Mark Galeck 2016-10-12 08:56 Mark_Galeck Section => 2.3 Token Recognition 2016-10-12 08:56 Mark_Galeck Page Number => 2347-2348 2016-10-12 08:56 Mark_Galeck Line Number => 74761-74780 2016-10-12 23:29 shware_systems Note Added: 0003408 2016-10-13 01:44 Mark_Galeck Note Added: 0003409 2016-10-14 22:16 shware_systems Note Added: 0003416 2016-10-15 01:31 Mark_Galeck Note Added: 0003417 2016-10-25 13:04 shware_systems Note Added: 0003456 ======================================================================