Dear Andrew

I hope that I am interpreting the EBNF notation correctly (although  
the expression seems to be BNF, not EBNF - although W3C seems to use  
the BNF notation). Doesn't
identifier ::= ('_')* ( letter ) ( letter | '_' | digit )*
indicate that any number of '_' symbols may form the beginning of an  
identifier?

Perhaps a better way of expressing the W3C BNF rule would be
identifier ::= ( letter | '_' ) ( letter | '_' | digit )*

Best wishes
Poul

On 2007 Nov 22, at 09:23, Andrew Miller wrote:

> Hi all,
>
> I have been working on writing up a purely normative, unambiguous  
> draft
> of the CellML specification to facilitate discussions of how to  
> improve
> CellML in the future. As part of this, I have been rewriting most  
> of the
> text of the specification to follow good practices for normative
> specifications.
>
> One thing I have noticed during this process is that CellML's current
> text defining the format for CellML identifiers contradicts itself:
>
> "
>
> A valid CellML identifier must consist of only letters, digits and
>
> underscores, must contain at least one letter, and must not begin with
>
> a digit. This can be written using Extended Backus-Naur Form (EBNF)
>
> notation as follows:
>
> letter     ::= 'a'...'z','A'...'Z'
>
> digit      ::= '0'...'9'
>
> identifier ::= ('_')* ( letter ) ( letter | '_' | digit )*
>
> "
>
>
> The EBNF specification does not permit an identifier like _1foo  
> because
> it does not contain a letter before an underscore, while the text  
> of the
> specification does, because it contains only letters, digits, and
> underscores, contains at least one letter, and does not begin with  
> a digit.
>
> One rule or the other will need to be decided for the next CellML
> specification.
>
> I have, for now, taken the rule in the text as being normative and  
> have
> written it up. Note that I have not included an EBNF representation -
> this will belong in explanatory notes which annotate the normative
> specification.
>
> "
>
> Basic Latin alphabetic character
>
>     A Unicode character in the range U+0041 to U+005A or in the  
> range U+0061 to U+007A.
>
> European numeric character
>
>     A Unicode character in the range U+0030 to U+0039.
>
> Basic Latin alphanumeric character
>
>     A Unicode character which is either a Basic Latin alphabetic  
> character or a European numeric character.
>
> Basic Latin underscore
>
>     The Unicode character U+005F.
>
>
>     The following data representation formats are defined for use  
> in this specification:
>    1.
>
>                 CellML identifier:
>
>          1.
>
>             SHALL be a sequence of Unicode characters.
>
>          2.
>
>             SHALL NOT contain any characters except basic Latin  
> alphanumeric characters and basic Latin underscores.
>
>          3.
>
>             SHALL contain one or more basic Latin alphabetic  
> characters.
>
>          4.
>
>             SHALL NOT begin with a European numeric character.
>
> "
>
>
> Please let me know if you have an opinion on whether we should instead
> base this off the validity rules specified in the EBNF form from  
> CellML 1.1.
>
> Best regards,
> Andrew
>
> _______________________________________________
> cellml-discussion mailing list
> [email protected]
> http://www.cellml.org/mailman/listinfo/cellml-discussion

_______________________________________________
cellml-discussion mailing list
[email protected]
http://www.cellml.org/mailman/listinfo/cellml-discussion

Reply via email to