I am writing a lexer and parser for Metamath databases, and I do not know if it's better to check "string compliance" (e.g., questions like: is the string "az%e" allowed to be a label?) at lexing or at parsing:
At parsing: the lexer (skips whitespaces and comments and) returns tokens which are the Metamath keywords as well as Ident(string) for anything else. Then, the parser checks if Ident(string) satisfies the rules, e.g., when being passed by the lexer the sequence '$[', Ident(file), '$]', it will call a function is_valid_filename(file) which will check if 'file' contains forbidden characters for a filename, like '%'. It is not very satisfactory since there is some "rereading" of the characters of the string 'file' (once at lexing and once at parsing). At lexing: one can tell if a string of characters is: * a Metamath keyword; * something which can be (and only be): * a math symbol; * a math symbol or label or filename; * a math symbol or label or filename or complete compressed proof bloc; * a math symbol or compressed proof bloc (complete or incomplete); So the tokens given by the lexer are of one of these five kinds. This ensures "unique reading" but is probably more cumbersome (even if not directly coded from an automaton, the fact that the automaton has more states will reflect in the code). Do you have any advice ? How is it done in metamath.c and in rust ? (I'm not very proficient in these languages.) In mmverify.py, very few checks are done, so the problem does not arise. BenoƮt -- You received this message because you are subscribed to the Google Groups "Metamath" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/metamath/da674a38-c603-4a72-9c70-26ed19f2d4cdn%40googlegroups.com.
