Poul Nielsen wrote:
> Dear Andrew
>
> I hope that I am interpreting the EBNF notation correctly (although  
> the expression seems to be BNF, not EBNF - although W3C seems to use  
> the BNF notation). Doesn't
> identifier ::= ('_')* ( letter ) ( letter | '_' | digit )*
> indicate that any number of '_' symbols may form the beginning of an  
> identifier?
>   
Yes. The reason that _1foo is invalid according to the pattern is 
because the _ matches the ('_')*, but then the 1 doesn't match letter.

> Perhaps a better way of expressing the W3C BNF rule would be
> identifier ::= ( letter | '_' ) ( letter | '_' | digit )*
>   
This is not identical to what is written in the specification text 
either. The pattern you suggested would match e.g. _, which is not valid 
according to the text because it does not contain a letter.

The EBNF / BNF form corresponding to what the text in the CellML 
specification would be:

identifier ::= ( ( 'letter' ( letter | '_' | digit )* ) | ( ('_') ( letter | 
'_' | digit )* (letter) ( letter | '_' | digit )* )


 
However, I don't think we should represent the same thing redundantly in 
the normative specification - we should focus on getting either the 
English or the BNF / EBNF representation accurate and unambiguous, and 
include the other one only in the annotated specification. This will 
avoid any doubt if we get one wrong, and will also make the normative 
specification more lightweight and easier to work with.

Best regards.
Andrew

> Best wishes
> Poul
>
> On 2007 Nov 22, at 09:23, Andrew Miller wrote:
>
>   
>> Hi all,
>>
>> I have been working on writing up a purely normative, unambiguous  
>> draft
>> of the CellML specification to facilitate discussions of how to  
>> improve
>> CellML in the future. As part of this, I have been rewriting most  
>> of the
>> text of the specification to follow good practices for normative
>> specifications.
>>
>> One thing I have noticed during this process is that CellML's current
>> text defining the format for CellML identifiers contradicts itself:
>>
>> "
>>
>> A valid CellML identifier must consist of only letters, digits and
>>
>> underscores, must contain at least one letter, and must not begin with
>>
>> a digit. This can be written using Extended Backus-Naur Form (EBNF)
>>
>> notation as follows:
>>
>> letter     ::= 'a'...'z','A'...'Z'
>>
>> digit      ::= '0'...'9'
>>
>> identifier ::= ('_')* ( letter ) ( letter | '_' | digit )*
>>
>> "
>>
>>
>> The EBNF specification does not permit an identifier like _1foo  
>> because
>> it does not contain a letter before an underscore, while the text  
>> of the
>> specification does, because it contains only letters, digits, and
>> underscores, contains at least one letter, and does not begin with  
>> a digit.
>>
>> One rule or the other will need to be decided for the next CellML
>> specification.
>>
>> I have, for now, taken the rule in the text as being normative and  
>> have
>> written it up. Note that I have not included an EBNF representation -
>> this will belong in explanatory notes which annotate the normative
>> specification.
>>
>> "
>>
>> Basic Latin alphabetic character
>>
>>     A Unicode character in the range U+0041 to U+005A or in the  
>> range U+0061 to U+007A.
>>
>> European numeric character
>>
>>     A Unicode character in the range U+0030 to U+0039.
>>
>> Basic Latin alphanumeric character
>>
>>     A Unicode character which is either a Basic Latin alphabetic  
>> character or a European numeric character.
>>
>> Basic Latin underscore
>>
>>     The Unicode character U+005F.
>>
>>
>>     The following data representation formats are defined for use  
>> in this specification:
>>    1.
>>
>>                 CellML identifier:
>>
>>          1.
>>
>>             SHALL be a sequence of Unicode characters.
>>
>>          2.
>>
>>             SHALL NOT contain any characters except basic Latin  
>> alphanumeric characters and basic Latin underscores.
>>
>>          3.
>>
>>             SHALL contain one or more basic Latin alphabetic  
>> characters.
>>
>>          4.
>>
>>             SHALL NOT begin with a European numeric character.
>>
>> "
>>
>>
>> Please let me know if you have an opinion on whether we should instead
>> base this off the validity rules specified in the EBNF form from  
>> CellML 1.1.
>>
>> Best regards,
>> Andrew
>>
>> _______________________________________________
>> cellml-discussion mailing list
>> cellml-discussion@cellml.org
>> http://www.cellml.org/mailman/listinfo/cellml-discussion
>>     
>
> _______________________________________________
> cellml-discussion mailing list
> cellml-discussion@cellml.org
> http://www.cellml.org/mailman/listinfo/cellml-discussion
>   

_______________________________________________
cellml-discussion mailing list
cellml-discussion@cellml.org
http://www.cellml.org/mailman/listinfo/cellml-discussion

Reply via email to