I am writing a lexer and parser for Metamath databases, and I do not know 
if it's better to check "string compliance" (e.g., questions like: is the 
string "az%e" allowed to be a label?) at lexing or at parsing:

At parsing: the lexer (skips whitespaces and comments and) returns tokens 
which are the Metamath keywords as well as Ident(string) for anything else. 
 Then, the parser checks if Ident(string) satisfies the rules, e.g., when 
being passed by the lexer the sequence '$[', Ident(file), '$]', it will 
call a function is_valid_filename(file) which will check if 'file' contains 
forbidden characters for a filename, like '%'.  It is not very satisfactory 
since there is some "rereading" of the characters of the string 'file' 
(once at lexing and once at parsing).

At lexing: one can tell if a string of characters is:
* a Metamath keyword;
* something which can be (and only be):
   * a math symbol;
   * a math symbol or label or filename;
   * a math symbol or label or filename or complete compressed proof bloc;
   * a math symbol or compressed proof bloc (complete or incomplete);
So the tokens given by the lexer are of one of these five kinds.  This 
ensures "unique reading" but is probably more cumbersome (even if not 
directly coded from an automaton, the fact that the automaton has more 
states will reflect in the code).

Do you have any advice ?  How is it done in metamath.c and in rust ?  (I'm 
not very proficient in these languages.)  In mmverify.py, very few checks 
are done, so the problem does not arise.

BenoƮt

-- 
You received this message because you are subscribed to the Google Groups 
"Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/metamath/da674a38-c603-4a72-9c70-26ed19f2d4cdn%40googlegroups.com.

Reply via email to