[1003.1(2016)/Issue7+TC2 0001100]: Rewrite of Section 2.10 Shell Grammar, of the Shell Standard, to fix previous reports, fix new issues, and improve presentation.

Austin Group Bug Tracker Thu, 27 Oct 2016 05:42:47 -0700

The following issue has been SUBMITTED. 
====================================================================== 
http://austingroupbugs.net/view.php?id=1100 
====================================================================== 
Reported By:                Mark_Galeck
Assigned To:                
====================================================================== 
Project:                    1003.1(2016)/Issue7+TC2
Issue ID:                   1100
Category:                   Shell and Utilities
Type:                       Clarification Requested
Severity:                   Editorial
Priority:                   normal
Status:                     New
Name:                       Mark Galeck 
Organization:                
User Reference:              
Section:                    2.10 Shell Grammar 
Page Number:                2375-2381 
Line Number:                75873-76150 
Interp Status:              --- 
Final Accepted Text:         
====================================================================== 
Date Submitted:             2016-10-27 12:40 UTC
Last Modified:              2016-10-27 12:40 UTC
====================================================================== 
Summary:                    Rewrite of Section 2.10 Shell Grammar, of the Shell
Standard, to fix previous reports, fix new issues, and improve presentation.
Description: 
I recently made several reports concerning sections 2.10.1/2, and then I
saw at least one more problem of the similar kind.  If I continue making
incremental reports, even if the changes were approved, they will result in
a bigger and bigger mess.


Therefore I decided to cancel some previous reports, add new issues and
make one summary report, which is a comprehensive rewrite of the whole
Shell Grammar section, to fix the issues I find, as well as make the whole
presentation more straightforward and less convoluted.  


Here is the list of all the specific bugs this report addresses, including
some previous reports.  I am not listing changes here that morely improve
the presentation; to see all the changes, you should probably use some
"diff" program. 



1. Previous reports 1096, 1094, 1097, 1099, 1095, 1092 are included here
and can be cancelled.  

2.  Previous reports 1098, 1093, 1091, 1088 can be cancelled.  Let's say we
classify them as bogus, and those changes are not included here. 


3. (new issue) In the current standard, cmd_word cannot be a reserved word.
It is very convoluted, but if you carefully trace the application of
various rules to each other, you will end up that in fact, cmd_name and
cmd_word follow exactly the same semantics right now, both do not allow
reserved words.  Only cmd_name should not allow reserved words.  


4. (new issue) 

In multiple places in the current standard, rule 1 applies to WORD, and
thus reserved words are not allowed, where all words should be allowed. 
Some of the reports above cover this.  Additionally, we have:

WORD in the case_clause production - currently it cannot be a reserved
word, but it should be allowed to be a reserved word.  

Same for WORD in cmd_suffix production. 

------------------------

This rewrite is intended only to include the changes mentioned above, and
should otherwise be equivalent to the current standard.

I will be happy to answer any questions, provide clarifications, or fix if
you find any bugs.  

I do not have the time to discuss the merits of the changes.  The
maintainer of this standard is free to reject any part or all of this
report, or to continue to rewrite my Section 2.10 in any way that suits
them.  I completely do not mind.  

Yes the text I provide for the new Section 2.10 is just raw text format, it
does not have hyperlinks and different fonts.  Somebody else would have to
do that.  

Thank you!    


Desired Action: 
2.10. Shell Grammar

The following grammar defines the Shell Command Language. This formal
syntax shall take precedence over the preceding text syntax description.

The rules in Token Recognition delimit operator and word tokens.

In order to appear in the grammar as token identifiers, the tokens shall be
classified according to the following rules, applied in the following order
of precedence:


1. The token identifier for any operator, occurs when the token is that
operator.


2. IO_NUMBER is if the string consists solely of digits and the delimiter
character is one of '<' or '>'.


3. This rule only applies in function_body production; see below in the
grammar.

Word expansion and assignment shall never occur, even when required by the
rules below, when this production is being parsed. WORD is each token that
might either be expanded or have assignment applied to it, consisting only
of characters that are exactly described in Token Recognition.
 

4. The token identifier for any reserved word, occurs when the token is
exactly that reserved word.

Note:
Because at this point <quotation-mark> characters are retained in the
token, quoted strings cannot be recognized as reserved words. Also note
that line joining is done before tokenization, as described in Escape
Character (Backslash), so escaped <newline> characters are already removed
at this point.


5. This rule only applies in simple_command and cmd_prefix productions; see
below in the grammar.

For this rule, we define "important" <equal-sign> characters in a token:
they are unquoted (as determined while applying rule 4 from Token
Recognition), that are not part of an embedded parameter expansion, command
substitution, or arithmetic expansion construct (as determined while
applying rule 5 from Token Recognition), and do not begin the token.

For the definition of a valid "name", see XBD Name.

5a. 
If the token does not contain important '=' and is not a reserved word, it
is WORD.
If there are important '=' and all the characters preceding the first such
'=' do not form a valid name, it is unspecified whether it is WORD.

5b.  
If the token does not contain important '=', it is WORD.
If there are important '=' and all the characters preceding the first such
'=' do not form a valid name, it is unspecified whether it is WORD.

5c. 
If there are important '=' and all the characters preceding the first such
'=' form a valid name, it is ASSIGNMENT_WORD.
If they do not form a valid name, it is unspecified whether it is
ASSIGNMENT_WORD.

Assignment to the name within ASSIGNMENT_WORD token shall occur as
specified in Simple Commands.


6. This rule only applies in the function_definition production; see below
in the grammar.

NAME is any word that is not reserved, and is a valid name.


7. This rule only applies in the for_clause production; see below in the
grammar.

NAME is any valid name.


8. This rule only applies in pattern_not_esac productions; see below in the
grammar.

WORD is any word except 'esac'.


9. This rule only applies in here_end production; see below in the
grammar.

Quote removal shall be applied to the word to determine the delimiter that
is used to find the end of the here-document that begins after the next
<newline>.


10. This rule only applies in the filename production; see below in the
grammar. 

The expansions specified in Redirection shall occur. WORD occurs, if as
specified there, exactly one field results (or the result is unspecified),
and there are additional requirements on pathname expansion.


11. WORD is any word.
------------------------------

The WORD tokens shall have the word expansion rules applied to them
immediately before the associated command is executed, not at the time the
command is parsed.


/* -------------------------------------------------------
   The grammar symbols
   ------------------------------------------------------- */
%token  WORD
%token  ASSIGNMENT_WORD
%token  NAME
%token  NEWLINE
%token  IO_NUMBER


/* The following are the operators (see XBD Operator)
   containing more than one character. */



%token  AND_IF    OR_IF    DSEMI
/*      '&&'      '||'     ';;'    */


%token  DLESS  DGREAT  LESSAND  GREATAND  LESSGREAT  DLESSDASH
/*      '<<'   '>>'    '<&'     '>&'      '<>'       '<<-'   */


%token  CLOBBER
/*      '>|'   */


/* The following are the reserved words. */


%token  If    Then    Else    Elif    Fi    Do    Done
/*      'if'  'then'  'else'  'elif'  'fi'  'do'  'done'   */


%token  Case    Esac    While    Until    For
/*      'case'  'esac'  'while'  'until'  'for'   */


/* These are reserved words, not operator tokens, and are
   recognized when reserved words are recognized. */


%token  Lbrace    Rbrace    Bang
/*      '{'       '}'       '!'   */


%token  In
/*      'in'   */


/* -------------------------------------------------------
   The Grammar
   ------------------------------------------------------- */
%start program
%%
program          : linebreak complete_commands linebreak
                 | linebreak
                 ;
complete_commands: complete_commands newline_list complete_command
                 |                                complete_command
                 ;
complete_command : list separator_op
                 | list
                 ;
list             : list separator_op and_or
                 |                   and_or
                 ;
and_or           :                         pipeline
                 | and_or AND_IF linebreak pipeline
                 | and_or OR_IF  linebreak pipeline
                 ;
pipeline         :      pipe_sequence
                 | Bang pipe_sequence
                 ;
pipe_sequence    :                             command
                 | pipe_sequence '|' linebreak command
                 ;
command          : simple_command
                 | compound_command
                 | compound_command redirect_list
                 | function_definition
                 ;
compound_command : brace_group
                 | subshell
                 | for_clause
                 | case_clause
                 | if_clause
                 | while_clause
                 | until_clause
                 ;
subshell         : '(' compound_list ')'
                 ;
compound_list    : linebreak term
                 | linebreak term separator
                 ;
term             : term separator and_or
                 |                and_or
                 ;
/* Apply rule 7:*/
for_clause       : For NAME                                      do_group
                 | For NAME                       sequential_sep do_group
                 | For NAME linebreak In          sequential_sep do_group
                 | For NAME linebreak In wordlist sequential_sep do_group
                 ;
wordlist         : wordlist WORD
                 |          WORD
                 ;
case_clause      : Case WORD linebreak In linebreak case_list    Esac
                 | Case WORD linebreak In linebreak case_list_ns Esac
                 | Case WORD linebreak In linebreak              Esac
                 ;
case_list_ns     : case_list case_item_ns
                 |           case_item_ns
                 ;
case_list        : case_list case_item
                 |           case_item
                 ;
case_item_ns     :     pattern_not_esac ')' linebreak
                 |     pattern_not_esac ')' compound_list
                 | '(' pattern ')' linebreak
                 | '(' pattern ')' compound_list
                 ;
case_item        :     pattern_not_esac ')' linebreak     DSEMI linebreak
                 |     pattern_not_esac ')' compound_list DSEMI linebreak
                 | '(' pattern ')' linebreak     DSEMI linebreak
                 | '(' pattern ')' compound_list DSEMI linebreak
                 ;
/* Apply rule 8:*/
pattern_not_esac:       WORD            
                 |      WORD '|' pattern 
                 ;
pattern          :             WORD   
                 | pattern '|' WORD         
                 ;
if_clause        : If compound_list Then compound_list else_part Fi
                 | If compound_list Then compound_list           Fi
                 ;
else_part        : Elif compound_list Then compound_list
                 | Elif compound_list Then compound_list else_part
                 | Else compound_list
                 ;
while_clause     : While compound_list do_group
                 ;
until_clause     : Until compound_list do_group
                 ;
/* Apply rule 6:*/               
function_definition : NAME '(' ')' linebreak function_body
                 ;
/* Apply rule 3:*/
function_body    : compound_command                
                 | compound_command redirect_list
                 ;
brace_group      : Lbrace compound_list Rbrace
                 ;
do_group         : Do compound_list Done
                 ;
simple_command   : cmd_prefix WORD cmd_suffix        /* Apply rule 5b */
                 | cmd_prefix WORD                   /* Apply rule 5b */
                 | cmd_prefix
                 | WORD cmd_suffix                   /* Apply rule 5a */
                 | WORD                              /* Apply rule 5a */
                 ;
/* Apply rule 5c:*/
cmd_prefix       :            io_redirect
                 | cmd_prefix io_redirect
                 |            ASSIGNMENT_WORD
                 | cmd_prefix ASSIGNMENT_WORD
                 ;
cmd_suffix       :            io_redirect
                 | cmd_suffix io_redirect
                 |            WORD
                 | cmd_suffix WORD
                 ;
redirect_list    :               io_redirect
                 | redirect_list io_redirect
                 ;
io_redirect      :           io_file
                 | IO_NUMBER io_file
                 |           io_here
                 | IO_NUMBER io_here
                 ;
io_file          : '<'       filename
                 | LESSAND   filename
                 | '>'       filename
                 | GREATAND  filename
                 | DGREAT    filename
                 | LESSGREAT filename
                 | CLOBBER   filename
                 ;
filename         : WORD                      /* Apply rule 10*/            
        
                 ;
io_here          : DLESS     here_end
                 | DLESSDASH here_end
                 ;
here_end         : WORD                      /* Apply rule 9 */
                 ;
newline_list     :              NEWLINE
                 | newline_list NEWLINE
                 ;
linebreak        : newline_list
                 | /* empty */
                 ;
separator_op     : '&'
                 | ';'
                 ;
separator        : separator_op linebreak
                 | newline_list
                 ;
sequential_sep   : ';' linebreak
                 | newline_list
                 ;
====================================================================== 

Issue History 
Date Modified    Username       Field                    Change               
====================================================================== 
2016-10-27 12:40 Mark_Galeck    New Issue                                    
2016-10-27 12:40 Mark_Galeck    Name                      => Mark Galeck     
2016-10-27 12:40 Mark_Galeck    Section                   => 2.10 Shell Grammar
2016-10-27 12:40 Mark_Galeck    Page Number               => 2375-2381       
2016-10-27 12:40 Mark_Galeck    Line Number               => 75873-76150     
======================================================================

[1003.1(2016)/Issue7+TC2 0001100]: Rewrite of Section 2.10 Shell Grammar, of the Shell Standard, to fix previous reports, fix new issues, and improve presentation.

Reply via email to