Re: Are there good strategies for diagnosing performance issues?

amon Tue, 21 Nov 2017 13:07:13 -0800

A high cost in stack_step is not entirely unsurprising, but if it is taking 
too long subjectively then there are some points that might be fixable. 
Note that stack_step() implements the semantics of your parser, which are 
only run *after* a successful parse. I don't think using events in the 
recognizer phase contributes to this cost.

One important strategy is to make sure you're handling as much of your 
grammar as possible on the L0 level, not on the G1 level. This might not be 
always possible since this can affect the language matched by locally 
ambiguous language. However, a G1 rule such as "digit ::= '0' | '1' | ... | 
'9'" is almost certainly wrong. The L0 level can handle character-level 
matching more efficiently since it doesn't need to go through the 
`stack_step()` evaluation.

Where possible, prefer built-in semantics such as `::first` over providing 
your own callbacks. Ignoring the values of unnecessary symbols (like "Rule 
::= Interesting (Boring)") may be necessary to use built-in semantics, but 
is a great idea anyway since it avoids unnecessary evaluations.

The structure of your grammar should describe the AST you're trying to 
produce as closely as possible. With Marpa it is not necessary to rewrite 
your grammar to satisfy the restrictions of the parser (aside from 
extracting sequence rules). So if you translated your Yapp grammar 
directly, there might be unnecessary rules. But this advice is already 
about micro-optimization and unlikely to result in a phenomenal speedup.

All in all, I am surprised by your 2x slowdown since the semantics of Yapp 
and Marpa are comparable (doing less work is faster). Possibly, having 
moved from your hand-written lexer to Marpa's scanless interface may have 
caused some problem, e.g. suddenly having an order of magnitude more 
lexemes per document than before. But this kind of issue can't be discussed 
productively without seeing the grammar in question.

On Monday, 20 November 2017 22:52:13 UTC+1, [email protected] wrote:
>
> Hey, all,
>
>  
>
> I’ve been hoping for some time to replace an existing parser---implemented 
> using YAPP and a custom lexer---with a parser implemented using Marpa::R2.
>
>  
>
> After a moderate amount of work, I finally got a working parser.
>
>  
>
> The grammar is far easier to understand and work with than the old YAPP 
> parser; because we’re dealing with an indentation based format, I do have 
> to use a discard event to track indentation depths and emit indentation 
> tokens and the like.
>
>  
>
> However, it’s disappointingly slow, taking about twice as long to parse 
> our full set of files.  That was *really* unexpected.
>
>  
>
> Immediately suspecting my code, I cranked my pathological case through 
> Devel::NYTProf; imagine my surprise when the top entry in the list of ‘top 
> 15 subroutines’ was an xsub: Marpa::R2::Thin::V::*stack_step* 
> <https://groups.google.com/forum/parser-1-line.html#Marpa__R2__Thin__V__stack_step>.
>   
> And this wasn’t by a small amount: that routine took 556s out of 590s or 
> so, and the next most expensive routine is Marpa::R2::Thin::SLR::read at 
> 12.7s.  These completely dominated the total runtime.
>
>  
>
> While I will try to sanitize my parser and make it postable so people can 
> perhaps point out problems, I thought I might first just ask: is that sort 
> of high cost in stack_step generally representative of some sort of problem 
> in the grammar?  Is there some well-known construct that leads to a blow-up 
> that is easily eliminated, etc.?  Is there something I could log or examine 
> that might shed some light?
>
>  
>
> Any guidance would be appreciated.
>
>  
>
> Michael Dorman
>

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: Are there good strategies for diagnosing performance issues?

Reply via email to