A NOTE has been added to this issue. 
====================================================================== 
http://austingroupbugs.net/view.php?id=1084 
====================================================================== 
Reported By:                Mark_Galeck
Assigned To:                
====================================================================== 
Project:                    1003.1(2016)/Issue7+TC2
Issue ID:                   1084
Category:                   Shell and Utilities
Type:                       Error
Severity:                   Editorial
Priority:                   normal
Status:                     New
Name:                       Mark Galeck 
Organization:                
User Reference:              
Section:                    2.3 Token Recognition 
Page Number:                2347-2348 
Line Number:                74761-74780 
Interp Status:              --- 
Final Accepted Text:         
====================================================================== 
Date Submitted:             2016-10-12 08:56 UTC
Last Modified:              2016-10-25 13:04 UTC
====================================================================== 
Summary:                    rule 3, 4, 5 do not say that a token is started, if
needed
====================================================================== 

---------------------------------------------------------------------- 
 (0003456) shware_systems (reporter) - 2016-10-25 13:04
 http://austingroupbugs.net/view.php?id=1084#c3456 
---------------------------------------------------------------------- 
I am explaining how the standard can be interpreted so what's there does
match existing behaviors and not arguing, per your point 1. as something
questionable just looking at the normative text. I'm more right than not
here, whether that's easy to believe or not, as someone that has been
involved in the more recent changes to that text. I leave it as my opinion
because it's easy enough to overlook nuances intended by the original
authors of those sections, some of whom (if not all) are still involved
with the list, so I do not speak for them.

Note the 'same current char' and 'may terminate preceding tokens' clauses I
used. The basic loop isn't:
while (*curchar++!=EOL) {apply a single rule};

it's more a:
if (not empty input) do {apply rules, maybe recursively, and at EOL
applying the grammar to see if maybe io_here bodies need to be processed,
and then possibly reapplying rules due to detected alias expansions, again
potentially with recursion} until (*curchar==EOI); 

one. It is the individual rules that say when curchar++ may be executed,
possibly as part of a sub-loop specific to that rule or to do look-ahead
checking, such as for on-the-fly line joining when *curchar=='\'. The
standard leaves open *curchar may be referencing an input buffer where line
joining has been preprocessed as much as practical as well.

Rule 10 does not say use the current char as first character of a new token
and access a new char as current char, just to use it as first char.
 
Rule 3 does apply to terminate, per above, the '>' token of '>foobar', but
it does not necessarily start a new token. Rule 3 applies for '>#foobar'
also, as terminating '>', but Rule 9 is what determines how the rest of
that line is classified, with '#' as the current char, as beginning a
comment.

The delimiting newline Rule 9 says to look for and move past can also be
overridden by Rule 1 if the comment is on the last line of a file that
isn't terminated by an EOL. It also applies if the last character is a NUL,
if the source is a C string as a sh -c argument or system() interface call,
or a Ctrl-Z if that's the interactive EOF control character, as variants
all symbolically EOI.

For 'foo'bar'baz' the first ' gets classified by Rule 10, and then by Rule
4 as the beginning of a single quoted string. A new current char, the 'f',
is then accessed according to that rule.
 
For the last case, Rule 10 starts the token, $$ is a valid special
parameter (the shells' numeric pid after evaluation) by Rule 5 and 2.5.2,
and '#' by Rule 10 followed by Rule 9 again begins a comment.

No, it's not straightforward, but it is essentially correct as is in
describing how various implementations process most scripts. 

Issue History 
Date Modified    Username       Field                    Change               
====================================================================== 
2016-10-12 08:56 Mark_Galeck    New Issue                                    
2016-10-12 08:56 Mark_Galeck    Name                      => Mark Galeck     
2016-10-12 08:56 Mark_Galeck    Section                   => 2.3 Token
Recognition
2016-10-12 08:56 Mark_Galeck    Page Number               => 2347-2348       
2016-10-12 08:56 Mark_Galeck    Line Number               => 74761-74780     
2016-10-12 23:29 shware_systems Note Added: 0003408                          
2016-10-13 01:44 Mark_Galeck    Note Added: 0003409                          
2016-10-14 22:16 shware_systems Note Added: 0003416                          
2016-10-15 01:31 Mark_Galeck    Note Added: 0003417                          
2016-10-25 13:04 shware_systems Note Added: 0003456                          
======================================================================


Reply via email to