Oops, I take that back. My expression doesn't reflect the requirement  
that an identifier must contain at least one letter.
identifier ::= ('_')* ( letter ) ( letter | '_' | digit )*
seems to be correct.

On 2007 Nov 22, at 09:52, Poul Nielsen wrote:

> Dear Andrew
>
> I hope that I am interpreting the EBNF notation correctly (although  
> the expression seems to be BNF, not EBNF - although W3C seems to  
> use the BNF notation). Doesn't
> identifier ::= ('_')* ( letter ) ( letter | '_' | digit )*
> indicate that any number of '_' symbols may form the beginning of  
> an identifier?
>
> Perhaps a better way of expressing the W3C BNF rule would be
> identifier ::= ( letter | '_' ) ( letter | '_' | digit )*
>
> Best wishes
> Poul
>
> On 2007 Nov 22, at 09:23, Andrew Miller wrote:
>
>> Hi all,
>>
>> I have been working on writing up a purely normative, unambiguous  
>> draft
>> of the CellML specification to facilitate discussions of how to  
>> improve
>> CellML in the future. As part of this, I have been rewriting most  
>> of the
>> text of the specification to follow good practices for normative
>> specifications.
>>
>> One thing I have noticed during this process is that CellML's current
>> text defining the format for CellML identifiers contradicts itself:
>>
>> "
>>
>> A valid CellML identifier must consist of only letters, digits and
>>
>> underscores, must contain at least one letter, and must not begin  
>> with
>>
>> a digit. This can be written using Extended Backus-Naur Form (EBNF)
>>
>> notation as follows:
>>
>> letter     ::= 'a'...'z','A'...'Z'
>>
>> digit      ::= '0'...'9'
>>
>> identifier ::= ('_')* ( letter ) ( letter | '_' | digit )*
>>
>> "
>>
>>
>> The EBNF specification does not permit an identifier like _1foo  
>> because
>> it does not contain a letter before an underscore, while the text  
>> of the
>> specification does, because it contains only letters, digits, and
>> underscores, contains at least one letter, and does not begin with  
>> a digit.
>>
>> One rule or the other will need to be decided for the next CellML
>> specification.
>>
>> I have, for now, taken the rule in the text as being normative and  
>> have
>> written it up. Note that I have not included an EBNF representation -
>> this will belong in explanatory notes which annotate the normative
>> specification.
>>
>> "
>>
>> Basic Latin alphabetic character
>>
>>     A Unicode character in the range U+0041 to U+005A or in the  
>> range U+0061 to U+007A.
>>
>> European numeric character
>>
>>     A Unicode character in the range U+0030 to U+0039.
>>
>> Basic Latin alphanumeric character
>>
>>     A Unicode character which is either a Basic Latin alphabetic  
>> character or a European numeric character.
>>
>> Basic Latin underscore
>>
>>     The Unicode character U+005F.
>>
>>
>>     The following data representation formats are defined for use  
>> in this specification:
>>    1.
>>
>>                 CellML identifier:
>>
>>          1.
>>
>>             SHALL be a sequence of Unicode characters.
>>
>>          2.
>>
>>             SHALL NOT contain any characters except basic Latin  
>> alphanumeric characters and basic Latin underscores.
>>
>>          3.
>>
>>             SHALL contain one or more basic Latin alphabetic  
>> characters.
>>
>>          4.
>>
>>             SHALL NOT begin with a European numeric character.
>>
>> "
>>
>>
>> Please let me know if you have an opinion on whether we should  
>> instead
>> base this off the validity rules specified in the EBNF form from  
>> CellML 1.1.
>>
>> Best regards,
>> Andrew
>>
>> _______________________________________________
>> cellml-discussion mailing list
>> cellml-discussion@cellml.org
>> http://www.cellml.org/mailman/listinfo/cellml-discussion
>

_______________________________________________
cellml-discussion mailing list
cellml-discussion@cellml.org
http://www.cellml.org/mailman/listinfo/cellml-discussion

Reply via email to