Re: Earley complexity over grammar size

emh Thu, 01 Nov 2018 17:47:30 -0700

Thanks! That helps.

On Thursday, November 1, 2018 at 8:35:55 PM UTC-4, Jeffrey Kegler wrote:
>
> Forgive me if I'm overexplaining -- you obviously know a lot of these 
> matters, but any over-explanations will help other readers, so I hope 
> you'll be patient with them.
>
> Quibble 1: "Complexity" since the 60s has usually referred to results in 
> terms of Landau (big-O) notation.  These are only relevant for infinite 
> sets, which do not of course occur in practice.  Nonetheless Big-O has 
> proved so helpful in practice that its limits are sometimes forgotten.  The 
> grammar you cite is constant-sized, so in that sense there is no room for a 
> complexity advantage either way.
>
> Quibble 2: CYK is always cubic (O(n**3)), so that it can have a complexity 
> advantage only for inputs which Earley/Leo (=Marpa) is cubic, which are 
> rare in practical use.  But these do occur in NLP.  So to keep this a horse 
> race, let's assume we're talking about one of the grammars where Earley/Leo 
> goes cubic.
>
> Foreshadowing: Elsewhere I've listed the 5 algorithms that IMHO might 
> stand the test of time and survive in some form: regular expressions, 
> Earley/Leo, recursive descent, PEG and CYK.  So I believe there are some 
> grammars for which CYK is better than Marpa -- how many and how interesting 
> they are is another question.
>
> OK, now for handicapping the horse race:
>
> Marpa would have to put a prediction for every one of the rules you list 
> into one of its Earley sets, but for predictions this can be done by 
> flipping a bit in a vector -- in a given Earley set, each prediciton 
> depends only on the rule, which is an integer less than a constant.  Last I 
> recall, Marpa does not use such a bit vector, but that's only because it's 
> necessity is not proven -- Marpa seems to be fast enough.  After that 1st 
> Earley set only the rule whose initial terminal was found in the input will 
> generate entries in the Earley sets.
>
> CYK would behave similarly but the selection of the terminal would be out 
> of pairs with every other possible token in the grammar -- if the grammar 
> is truly large, we are going quadratic on a very large number.  And this 
> propagates up the tree.
>
> These are, I believe, valid and important point, but admittedly I have 
> acted as Marpa's lawyer, puffing up my client's case and poking holes in 
> the case for CYK.  For some grammars, I think a case could be made the 
> other way, and the verdict of a candid finder of fact might in fact go to 
> CYK.
>
> I hope this helps, jeffrey
>   
>
> On Thu, Nov 1, 2018 at 8:14 PM emh <[email protected] <javascript:>> 
> wrote:
>
>> In general, it seems that Earley, especially with Marpa enhancements, 
>> matches or improves on most any parsing algorithm. But its complexity seems 
>> to be a judged with respect to the length of the string, not the size of 
>> the grammar.
>>
>> I’m not great at complexity proofs, and I was wondering if Earley retains 
>> its performance advantages for very large numbers of rules, over CYK. If I 
>> understand correctly, it has to add Earley items for every rule, top down, 
>> and examine them. So for an NLP project I am doing, with many thousands of 
>> necessarily distinct rules, would its space/time complexity suffer greatly?
>>
>> For example, if every rule started with a distinct terminal, would Earley 
>> have to add and scan every one in S(0), while CYK would in constant time 
>> (and less space) select just the relevant rule(s)? As in:
>>
>> start: r1 | r2 | r3 | … | r_i
>>
>> r1: "Print" …
>> r2: "Set" …
>> r3: "Walk" …
>> ….
>> r_i: T_i …
>>
>> Then in S(0) we have to add and scan rules r1 … r_i, while in CYK it 
>> would immediately be able to select a very small subset of r1 … r_i (in 
>> CNF), without examining all other rules.
>>
>> Thanks for your help.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "marpa parser" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>


-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: Earley complexity over grammar size

Reply via email to