Re: Parsing as taking the derivative of a grammar

Anton Dyudin Tue, 12 Jan 2016 18:09:18 -0800

My intention was to compare stages of a "grammar" as defined in the
derivatives paper with a sequence of Earley sets similar in syntax to the
ones in yours, yes. A more mathy way to present the former sequence would
perhaps be:
  &start = (A + B) U (A + &start + B)
Which then gets transformed through
  -A : B U (&start + B)
  -A : 0 U ((B U (&start + B) + B)
        => (B U (&start + B)) + B
  -B : (ε U (0 + B)) + B  ∵  -B. &start = (0 + B) U (0 + &start + B) => 0
        => B


(I gloss over some invariants not present in the original paper, having to
do with the derivation step mapping from {sym, &ref, U, +} to one that
additionally contains 0 and ε, and the compaction step removing those back
again; my question is of high-level correspondence with marpa, which itself
does not admit nullable or void rules)

The fundamental projection that seems present to me is that a
differentiation grammar formally keeps only the remaining input, but all of
it in the same data structure; compare
  0. start: A + B ; A + &start + B
  1. -A : B ; &start + B # start_1(g) := (g + B)
  2. -A : start_1(B ; &start + B) # "predict"
  3. -B : start_1(ε ; 0) => ε + B => B # "reduce"
to the marpa
  0. start: •A + B; •A + &start + B
  1. -A :  (A+)•B ; (A+)•&start + B
           ; start_1:•A + B; •A + &start + B
  2. -A : start_1: (A+)•B; (A+)•&start + B
      ; start_2: (etc. unused)
  3. -B : start_1: (A+B)• => start(_0) : (A+&start+)•B

The main difference seems to be that the explicit tree sharing leaves the
rule names around to inspect, while Might-Darais has to stick them in
separate tagged εs.

NB. Regardless of the existence or fictouisness of such an equivalence,
writing these emails has helped my understanding of Earley tremendously, to
the point where I would almost be comfortable trying to write an
implementation; or possibly taking a deeper look at libmarpa, whose
interface docs I had skimmed and bounced off of. Thank you for being around
to answer questions :3

On Tuesday, January 12, 2016, Jeffrey Kegler <
[email protected]
<javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:

> This takes me deeper into derivatives than I've every gone, and probably
> beyond where I can help you.  I skimmed Might-Darais at the time, and then
> read Russ Cox's appreciation <http://research.swtch.com/yaccalive>
> carefully.  Russ seemed to say everything that needed to be said [ except
> he didn't mention Marpa :-) ], and I moved on.  I've never studied the
> derivatives method in detail.
>
> Many data structures mix rule, location of rule & location within rule in
> a data structure, and they can seem all to look the same.  The 2nd one you
> list seemed to be Earley tables. I'm not too clear on what the 1st is (I'm
> not good at reading LISP), but if I had to guess it is a state transition
> table, rather than a parse table. If so, they're quite different things.
>
> On Tue, Jan 12, 2016 at 2:26 PM, Anton Dyudin <[email protected]>
> wrote:
>
>> I was referring to the constant factor; O(n|G|) is claimed by the paper,
>> via "our grammars blew up exponentially so we had to compact them
>> continuously, now they seem not to though a pathological case can probably
>> be constructed". Which isn't the same as "complexity class has been
>> proven", sure.
>>
>> But my main goal here is to understand marpa, and in this case
>> specifically where the intuition that
>>   start: (alt (cat 'a' 'b') (cat 'a' <start> 'b')))
>>   = D_a => (alt 'b' (cat <start> 'b'))
>>   = D_a =>   (alt null (cat (alt 'b' (cat <start> 'b')) 'b'))
>>             => (cat (alt 'b' (cat <start> 'b')) 'b')
>>   = D_b => (alt 'b' (cat null 'b')) => 'b'
>> is equivalent to
>>   0.   {[start -> a b]:0 [start ->•a start b]:0}
>>   1.a {[start -> a•b]:0 [start -> a•start b]:0 [start -> •a b]:1 [start
>> -> •a start b]:1}
>>   2.a {[start -> a•b]:1 [start -> a•start b]:1 [start -> •a b]:2 [start
>> -> •a start b]:2}
>>   3.b {[start -> a start•b]:0}
>> modulo tree/by-reference structure, is off base.
>>
>> On Tuesday, January 12, 2016, Jeffrey Kegler <
>> [email protected]> wrote:
>>
>>> I didn't study Might-Darais closely -- Russ Cox did, and I'm relying on
>>> his observations.
>>>
>>> You can (and I do) think of packratting and backtacking as the same
>>> approach, it's just deciding whether to take the efficiency hit in space or
>>> in time.
>>>
>>> My idea is that, if you want a strong parserr, you should use a parse
>>> engine designed (like Earley's) to be strong from the ground up.  Almost
>>> everybody else starts with a more or less weak parser, and hacks on the
>>> power, knowing that's worst-case inefficient, but hoping it will work out
>>> in enough cases of interest.  Since Earley's is O(n) for every
>>> deterministic CFG, I don't see why folks insist on going for the weaker
>>> algorithms.  But my approach is definitely the minority one so far.
>>>
>>> Re "2 seconds to parse a 31-line Python file" -- I really don't pay much
>>> attention to specific speed claims in theoretical papers -- as opposed to
>>> big-O worse-case results.  With Marpa, you don't have to hope your grammar
>>> is O(n) -- you can know for sure.  Resorting to claims for specific
>>> grammars on specific processors to my mind shows the approach has run out
>>> of steam.
>>>
>>> Note that I *do* do benchmarks to compare implementations in my blog,
>>> but I don't report those in theoretical contexts.  Readers of theory papers
>>> should care how fast your algorithm is, not how fast your current processor
>>> is.
>>>
>>> On Tue, Jan 12, 2016 at 9:53 AM, Anton Dyudin <[email protected]>
>>> wrote:
>>>
>>>> Would I be correct to infer that memoization is a form of backtracking,
>>>> and packratting is what the "fixpoint" computation is doing? And is the
>>>> issue with those being expensive on real-world size grammars/input? 2
>>>> seconds to parse a 31-line Python file seems a bit steep, though not
>>>> exponential.
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "marpa parser" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "marpa parser" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "marpa parser" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "marpa parser" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: Parsing as taking the derivative of a grammar

Reply via email to