Oops, I take that back. My expression doesn't reflect the requirement that an identifier must contain at least one letter. identifier ::= ('_')* ( letter ) ( letter | '_' | digit )* seems to be correct.
On 2007 Nov 22, at 09:52, Poul Nielsen wrote: > Dear Andrew > > I hope that I am interpreting the EBNF notation correctly (although > the expression seems to be BNF, not EBNF - although W3C seems to > use the BNF notation). Doesn't > identifier ::= ('_')* ( letter ) ( letter | '_' | digit )* > indicate that any number of '_' symbols may form the beginning of > an identifier? > > Perhaps a better way of expressing the W3C BNF rule would be > identifier ::= ( letter | '_' ) ( letter | '_' | digit )* > > Best wishes > Poul > > On 2007 Nov 22, at 09:23, Andrew Miller wrote: > >> Hi all, >> >> I have been working on writing up a purely normative, unambiguous >> draft >> of the CellML specification to facilitate discussions of how to >> improve >> CellML in the future. As part of this, I have been rewriting most >> of the >> text of the specification to follow good practices for normative >> specifications. >> >> One thing I have noticed during this process is that CellML's current >> text defining the format for CellML identifiers contradicts itself: >> >> " >> >> A valid CellML identifier must consist of only letters, digits and >> >> underscores, must contain at least one letter, and must not begin >> with >> >> a digit. This can be written using Extended Backus-Naur Form (EBNF) >> >> notation as follows: >> >> letter ::= 'a'...'z','A'...'Z' >> >> digit ::= '0'...'9' >> >> identifier ::= ('_')* ( letter ) ( letter | '_' | digit )* >> >> " >> >> >> The EBNF specification does not permit an identifier like _1foo >> because >> it does not contain a letter before an underscore, while the text >> of the >> specification does, because it contains only letters, digits, and >> underscores, contains at least one letter, and does not begin with >> a digit. >> >> One rule or the other will need to be decided for the next CellML >> specification. >> >> I have, for now, taken the rule in the text as being normative and >> have >> written it up. Note that I have not included an EBNF representation - >> this will belong in explanatory notes which annotate the normative >> specification. >> >> " >> >> Basic Latin alphabetic character >> >> A Unicode character in the range U+0041 to U+005A or in the >> range U+0061 to U+007A. >> >> European numeric character >> >> A Unicode character in the range U+0030 to U+0039. >> >> Basic Latin alphanumeric character >> >> A Unicode character which is either a Basic Latin alphabetic >> character or a European numeric character. >> >> Basic Latin underscore >> >> The Unicode character U+005F. >> >> >> The following data representation formats are defined for use >> in this specification: >> 1. >> >> CellML identifier: >> >> 1. >> >> SHALL be a sequence of Unicode characters. >> >> 2. >> >> SHALL NOT contain any characters except basic Latin >> alphanumeric characters and basic Latin underscores. >> >> 3. >> >> SHALL contain one or more basic Latin alphabetic >> characters. >> >> 4. >> >> SHALL NOT begin with a European numeric character. >> >> " >> >> >> Please let me know if you have an opinion on whether we should >> instead >> base this off the validity rules specified in the EBNF form from >> CellML 1.1. >> >> Best regards, >> Andrew >> >> _______________________________________________ >> cellml-discussion mailing list >> cellml-discussion@cellml.org >> http://www.cellml.org/mailman/listinfo/cellml-discussion > _______________________________________________ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion