Re: [Metamath] Re: How should I approach comment parsing?

'Philip White' via Metamath Wed, 18 May 2022 18:31:50 -0700

Maybe I'm running into issues that were mentioned in the email thread
you linked to( 
https://groups.google.com/g/metamath/c/kdbrroGzJnc/m/7PCMZGgSAQAJ ),
but what's the right thing to do about this comment text that shows up
in set.mm:


  @( The cardinality of a set is a cardinal number. @)
  cardel @p |- ( card ` A ) e. Card @=
    ( ccrd cfv ccdn wcel wceq cardidm elcard mpbir ) ABCZDEJBCJFAGJHI @.
    @( [16-Jan-2005] @) @( [23-Sep-2004] @)

It's a theorem with backticks in it that got commented out, but the
backticks weren't doubled. My parser thinks that this backtick should
start math mode, and that seems right? I wouldn't mind sending a PR to
fix the places that break this rule, if that is desirable.

On Sat, 14 May 2022 00:55:40 -0400
Mario Carneiro <[email protected]> wrote:

> On Fri, May 13, 2022 at 8:15 PM 'Philip White' via Metamath <
> [email protected]> wrote:  
> 
> > Thanks for the depth of the responses!
> >
> > A way that I'm used to writing parsers is to first write a lexer
> > that operates on the part of the syntax that falls into the
> > "regular" class of grammars. The main metamath syntax fits this
> > neatly, including comments. However, once you try to understand the
> > contents of those comments, the only part of the syntax that
> > remains regular is single characters as tokens, I think. To
> > illustrate this, note that "latexdef" is an ordinary symbol token
> > in the main metamath syntax, but it is a keyword in $t comment
> > syntax. 
> 
> You should parse the comment twice with two entirely separate parsers.
> Using a single multi-mode parser is going to be very messy, especially
> because one parser has absolute veto power over the other.
> 
> 
> > I'm glad you wrote down the grammar for comments; I was looking for
> > that at one point. The grammar does use characters as tokens, so it
> > doesn't solve the issue (I know you weren't saying that it did).
> >  
> 
> I'm not sure what you mean by "solve the issue". That's what the
> grammar is; it plainly allows no spaces around at least semicolons so
> the grammar has to account for that.
> 
> My first pass at this has been to parse the main metamath syntax, and
> > then run a totally separate parser on the string contents of each
> > comment.
> >  
> 
> Good. Do that. It will be good for your sanity.
> 
> However, this won't hold up now that I'm starting to parse
> > the markup of ordinary comments, since math mode requires switching
> > back to the ordinary metamath tokens.
> >  
> 
> Oh... the math markup syntax is a whole separate mess. The issue
> there is that the parser is a bunch of layered regexes, so it has no
> nice grammar at all, at least if you care about the edge cases. The
> recent discussion
> https://groups.google.com/g/metamath/c/kdbrroGzJnc/m/7PCMZGgSAQAJ
> chronicles my attempt to write a fully accurate markup parser.
> 
> Ignoring the edge cases, I think the most reasonable way to understand
> markup parsing is just as having a string-quoting operator. The body
> text is parsed on a character level, but `foo` starts a quoted math
> section, and you use the metamath parser (split on space) to get the
> tokens out (after replacing `` with `). I don't think using a CFG
> grammar here would help.
> 
> 
> > A solution that seems to me more elegant (though I haven't tried it
> > yet), is to have a single parser, but multiple lexing functions that
> > operate. The parser calls the appropriate lexer. Each lexer reads
> > the regular part of its respective "sub-syntax", and the parser
> > decides when to switch between sub-languages.
> >  
> 
> The different parsers really have nothing in common. It's just
> completely separate languages with one hiding out in the legal
> comment syntax of the other.
> 
> At several points in this process, I've found myself wishing that the
> > minimalism of the main metamath syntax was preserved in the comment
> > syntax. For example, the $t comment section could have used
> > $latexdef and $htmldef as keywords, and instead of C-style
> > comments, allow actual nested comments. The property of the syntax
> > being a sequence of space-delimited symbols plus keywords that
> > start with $ is refreshingly simple, and it would have been nice to
> > work with it in comments. Anyway, I don't mean to criticize from an
> > armchair the folks who implemented this system; it would be rather
> > difficult to change at this point, I would think.
> >  
> 
> I wasn't involved in the original decision making here but if I had to
> guess, the reason is that we don't want the $ to confuse a metamath
> parser. Keep in mind that for the vast majority of metamath parsers,
> comments are just comments, no parsing is required. In particular
> nested comments would impose a burden on simplistic verifiers, so
> using a different style of comment seems like a good idea to me. (I
> wouldn't mind spacing things out a bit more though, to keep the same
> simple lexer.)
> 
> 
> > Another question about comments: what is a good heuristic for
> > deciding when a comment is attached to a statement? My first guess
> > has been to attach the immediately preceding comment to each
> > theorem or axiom, if there is one. This seems to yield good
> > results, but I noticed that set.mm sometimes has comments on the
> > same line, directly after the the statement (this is especially
> > common for constant and variable declarations). I doubt that
> > attaching an comments immediately after a statement would be good,
> > unless I make the parser be whitespace-sensitive so that I can
> > distinguish when it is on the same line as the declaration.
> >  
> 
> The official definition is that a markup comment immediately precedes
> a labeled statement ($f $a and $p, I don't think you can comment $e).
> $c and $v statements don't have tooling-recognized comments, so you
> should read those comments as analogous to single line non-doc
> comments. If you would like to parse these comments, one reasonable
> definition is that the comment should come immediately after the $c
> comment without an intervening newline. (Again, "unless I make the
> parser be whitespace-sensitive so that I can distinguish when it is
> on the same line as the declaration." sounds like you are trying to
> make one parser to rule them all and this will not end well. Markup
> parsing is substantially more heuristic than everything else, likely
> because it arose organically from unofficial formatting conventions.)
> 
> Mario
> 

-- 
You received this message because you are subscribed to the Google Groups 
"Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/metamath/20220518190056.20061d14%40ithilien.

Re: [Metamath] Re: How should I approach comment parsing?

Reply via email to