Hi all,

I have been working on writing up a purely normative, unambiguous draft 
of the CellML specification to facilitate discussions of how to improve 
CellML in the future. As part of this, I have been rewriting most of the 
text of the specification to follow good practices for normative 

One thing I have noticed during this process is that CellML's current 
text defining the format for CellML identifiers contradicts itself:


A valid CellML identifier must consist of only letters, digits and

underscores, must contain at least one letter, and must not begin with

a digit. This can be written using Extended Backus-Naur Form (EBNF)

notation as follows:

letter     ::= 'a'...'z','A'...'Z'

digit      ::= '0'...'9'

identifier ::= ('_')* ( letter ) ( letter | '_' | digit )*


The EBNF specification does not permit an identifier like _1foo because 
it does not contain a letter before an underscore, while the text of the 
specification does, because it contains only letters, digits, and 
underscores, contains at least one letter, and does not begin with a digit.

One rule or the other will need to be decided for the next CellML 

I have, for now, taken the rule in the text as being normative and have 
written it up. Note that I have not included an EBNF representation - 
this will belong in explanatory notes which annotate the normative 


Basic Latin alphabetic character

    A Unicode character in the range U+0041 to U+005A or in the range U+0061 to 

European numeric character

    A Unicode character in the range U+0030 to U+0039.

Basic Latin alphanumeric character

    A Unicode character which is either a Basic Latin alphabetic character or a 
European numeric character.

Basic Latin underscore

    The Unicode character U+005F.

    The following data representation formats are defined for use in this 

                CellML identifier:


            SHALL be a sequence of Unicode characters.


            SHALL NOT contain any characters except basic Latin alphanumeric 
characters and basic Latin underscores.


            SHALL contain one or more basic Latin alphabetic characters.


            SHALL NOT begin with a European numeric character.


Please let me know if you have an opinion on whether we should instead 
base this off the validity rules specified in the EBNF form from CellML 1.1.

Best regards,

cellml-discussion mailing list

Reply via email to