Thanks! That helps. On Thursday, November 1, 2018 at 8:35:55 PM UTC-4, Jeffrey Kegler wrote: > > Forgive me if I'm overexplaining -- you obviously know a lot of these > matters, but any over-explanations will help other readers, so I hope > you'll be patient with them. > > Quibble 1: "Complexity" since the 60s has usually referred to results in > terms of Landau (big-O) notation. These are only relevant for infinite > sets, which do not of course occur in practice. Nonetheless Big-O has > proved so helpful in practice that its limits are sometimes forgotten. The > grammar you cite is constant-sized, so in that sense there is no room for a > complexity advantage either way. > > Quibble 2: CYK is always cubic (O(n**3)), so that it can have a complexity > advantage only for inputs which Earley/Leo (=Marpa) is cubic, which are > rare in practical use. But these do occur in NLP. So to keep this a horse > race, let's assume we're talking about one of the grammars where Earley/Leo > goes cubic. > > Foreshadowing: Elsewhere I've listed the 5 algorithms that IMHO might > stand the test of time and survive in some form: regular expressions, > Earley/Leo, recursive descent, PEG and CYK. So I believe there are some > grammars for which CYK is better than Marpa -- how many and how interesting > they are is another question. > > OK, now for handicapping the horse race: > > Marpa would have to put a prediction for every one of the rules you list > into one of its Earley sets, but for predictions this can be done by > flipping a bit in a vector -- in a given Earley set, each prediciton > depends only on the rule, which is an integer less than a constant. Last I > recall, Marpa does not use such a bit vector, but that's only because it's > necessity is not proven -- Marpa seems to be fast enough. After that 1st > Earley set only the rule whose initial terminal was found in the input will > generate entries in the Earley sets. > > CYK would behave similarly but the selection of the terminal would be out > of pairs with every other possible token in the grammar -- if the grammar > is truly large, we are going quadratic on a very large number. And this > propagates up the tree. > > These are, I believe, valid and important point, but admittedly I have > acted as Marpa's lawyer, puffing up my client's case and poking holes in > the case for CYK. For some grammars, I think a case could be made the > other way, and the verdict of a candid finder of fact might in fact go to > CYK. > > I hope this helps, jeffrey > > > On Thu, Nov 1, 2018 at 8:14 PM emh <[email protected] <javascript:>> > wrote: > >> In general, it seems that Earley, especially with Marpa enhancements, >> matches or improves on most any parsing algorithm. But its complexity seems >> to be a judged with respect to the length of the string, not the size of >> the grammar. >> >> I’m not great at complexity proofs, and I was wondering if Earley retains >> its performance advantages for very large numbers of rules, over CYK. If I >> understand correctly, it has to add Earley items for every rule, top down, >> and examine them. So for an NLP project I am doing, with many thousands of >> necessarily distinct rules, would its space/time complexity suffer greatly? >> >> For example, if every rule started with a distinct terminal, would Earley >> have to add and scan every one in S(0), while CYK would in constant time >> (and less space) select just the relevant rule(s)? As in: >> >> start: r1 | r2 | r3 | … | r_i >> >> r1: "Print" … >> r2: "Set" … >> r3: "Walk" … >> …. >> r_i: T_i … >> >> Then in S(0) we have to add and scan rules r1 … r_i, while in CYK it >> would immediately be able to select a very small subset of r1 … r_i (in >> CNF), without examining all other rules. >> >> Thanks for your help. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "marpa parser" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> For more options, visit https://groups.google.com/d/optout. >> >
-- You received this message because you are subscribed to the Google Groups "marpa parser" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
