Hi all, I have been working on writing up a purely normative, unambiguous draft of the CellML specification to facilitate discussions of how to improve CellML in the future. As part of this, I have been rewriting most of the text of the specification to follow good practices for normative specifications.
One thing I have noticed during this process is that CellML's current text defining the format for CellML identifiers contradicts itself: " A valid CellML identifier must consist of only letters, digits and underscores, must contain at least one letter, and must not begin with a digit. This can be written using Extended Backus-Naur Form (EBNF) notation as follows: letter ::= 'a'...'z','A'...'Z' digit ::= '0'...'9' identifier ::= ('_')* ( letter ) ( letter | '_' | digit )* " The EBNF specification does not permit an identifier like _1foo because it does not contain a letter before an underscore, while the text of the specification does, because it contains only letters, digits, and underscores, contains at least one letter, and does not begin with a digit. One rule or the other will need to be decided for the next CellML specification. I have, for now, taken the rule in the text as being normative and have written it up. Note that I have not included an EBNF representation - this will belong in explanatory notes which annotate the normative specification. " Basic Latin alphabetic character A Unicode character in the range U+0041 to U+005A or in the range U+0061 to U+007A. European numeric character A Unicode character in the range U+0030 to U+0039. Basic Latin alphanumeric character A Unicode character which is either a Basic Latin alphabetic character or a European numeric character. Basic Latin underscore The Unicode character U+005F. The following data representation formats are defined for use in this specification: 1. CellML identifier: 1. SHALL be a sequence of Unicode characters. 2. SHALL NOT contain any characters except basic Latin alphanumeric characters and basic Latin underscores. 3. SHALL contain one or more basic Latin alphabetic characters. 4. SHALL NOT begin with a European numeric character. " Please let me know if you have an opinion on whether we should instead base this off the validity rules specified in the EBNF form from CellML 1.1. Best regards, Andrew _______________________________________________ cellml-discussion mailing list firstname.lastname@example.org http://www.cellml.org/mailman/listinfo/cellml-discussion