A high cost in stack_step is not entirely unsurprising, but if it is taking too long subjectively then there are some points that might be fixable. Note that stack_step() implements the semantics of your parser, which are only run *after* a successful parse. I don't think using events in the recognizer phase contributes to this cost.
One important strategy is to make sure you're handling as much of your grammar as possible on the L0 level, not on the G1 level. This might not be always possible since this can affect the language matched by locally ambiguous language. However, a G1 rule such as "digit ::= '0' | '1' | ... | '9'" is almost certainly wrong. The L0 level can handle character-level matching more efficiently since it doesn't need to go through the `stack_step()` evaluation. Where possible, prefer built-in semantics such as `::first` over providing your own callbacks. Ignoring the values of unnecessary symbols (like "Rule ::= Interesting (Boring)") may be necessary to use built-in semantics, but is a great idea anyway since it avoids unnecessary evaluations. The structure of your grammar should describe the AST you're trying to produce as closely as possible. With Marpa it is not necessary to rewrite your grammar to satisfy the restrictions of the parser (aside from extracting sequence rules). So if you translated your Yapp grammar directly, there might be unnecessary rules. But this advice is already about micro-optimization and unlikely to result in a phenomenal speedup. All in all, I am surprised by your 2x slowdown since the semantics of Yapp and Marpa are comparable (doing less work is faster). Possibly, having moved from your hand-written lexer to Marpa's scanless interface may have caused some problem, e.g. suddenly having an order of magnitude more lexemes per document than before. But this kind of issue can't be discussed productively without seeing the grammar in question. On Monday, 20 November 2017 22:52:13 UTC+1, [email protected] wrote: > > Hey, all, > > > > I’ve been hoping for some time to replace an existing parser---implemented > using YAPP and a custom lexer---with a parser implemented using Marpa::R2. > > > > After a moderate amount of work, I finally got a working parser. > > > > The grammar is far easier to understand and work with than the old YAPP > parser; because we’re dealing with an indentation based format, I do have > to use a discard event to track indentation depths and emit indentation > tokens and the like. > > > > However, it’s disappointingly slow, taking about twice as long to parse > our full set of files. That was *really* unexpected. > > > > Immediately suspecting my code, I cranked my pathological case through > Devel::NYTProf; imagine my surprise when the top entry in the list of ‘top > 15 subroutines’ was an xsub: Marpa::R2::Thin::V::*stack_step* > <https://groups.google.com/forum/parser-1-line.html#Marpa__R2__Thin__V__stack_step>. > > And this wasn’t by a small amount: that routine took 556s out of 590s or > so, and the next most expensive routine is Marpa::R2::Thin::SLR::read at > 12.7s. These completely dominated the total runtime. > > > > While I will try to sanitize my parser and make it postable so people can > perhaps point out problems, I thought I might first just ask: is that sort > of high cost in stack_step generally representative of some sort of problem > in the grammar? Is there some well-known construct that leads to a blow-up > that is easily eliminated, etc.? Is there something I could log or examine > that might shed some light? > > > > Any guidance would be appreciated. > > > > Michael Dorman > -- You received this message because you are subscribed to the Google Groups "marpa parser" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
