In general, it seems that Earley, especially with Marpa enhancements, matches or improves on most any parsing algorithm. But its complexity seems to be a judged with respect to the length of the string, not the size of the grammar.
I’m not great at complexity proofs, and I was wondering if Earley retains its performance advantages for very large numbers of rules, over CYK. If I understand correctly, it has to add Earley items for every rule, top down, and examine them. So for an NLP project I am doing, with many thousands of necessarily distinct rules, would its space/time complexity suffer greatly? For example, if every rule started with a distinct terminal, would Earley have to add and scan every one in S(0), while CYK would in constant time (and less space) select just the relevant rule(s)? As in: start: r1 | r2 | r3 | … | r_i r1: "Print" … r2: "Set" … r3: "Walk" … …. r_i: T_i … Then in S(0) we have to add and scan rules r1 … r_i, while in CYK it would immediately be able to select a very small subset of r1 … r_i (in CNF), without examining all other rules. Thanks for your help. -- You received this message because you are subscribed to the Google Groups "marpa parser" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
