On Wed, 2010-03-17 at 15:02 -0400, Francis Giraldeau wrote: > Le vendredi 19 février 2010 à 15:19 -0800, David Lutterkort a écrit : > > > > let directives = (directive1|directive2|...|directiveN)* > > > > Because there are so many directives, the typechecker runs out of > > memory, even if you give it a lot of memory. This is particularly > > annoying since it's perfectly fine to _use_ the lens, i.e. even though > > you can't typecheck the lens, you can use them for parsing without a > > noticeable performance problem (if you're willing to assume you have no > > type errors ;). > > If I understand correctly, typechecking union involves computing the > intersection of two automaton, which has complexity of n^4 and each lens > have to be tested against each other, that is n^2, the total runtime > complexity is n^6, is that right?
Possibly ;) The limitation is more in the constants than the overall runtime behavior, since we actually run out of memory. The issue isn't so much with intersections (what Anders calls vertical ambiguity), but with concatenation (horizontal ambiguity in Anders' nomenclature): when you write down a lens l*, the typechecker checks that the repetition of l is unambiguous, which boils down to checking that the concatenation l . l* is unambiguous. Horizontal ambiguity is expensive since it involves three intersections of automata that are roughly twice the size of the initial automata. The OOM happens in fa_ambig_example at line 2835 'amb = fa_intersect(b1, b2)' There are other things that slow down the typechecker unnecessarily: for example, there's no memoization, i.e., if you write 'l*' in several places, they each get checked separately - you can work around that by hand by assigning those constructs to a variable, and using those instead. > > I've been trying to address that by playing various tricks with reguar > > expressions inside libfa (the OOM happens in fa_ambig_example) - the > > next thing to try, when I get some time, is to convert regular > > expressions to finite automata using a derivative-based approach[1] I've > > got some patches to do the canonization of regular expressions their > > construction requires, I just haven't had a chance to implement their > > DFA construction. > > As I understand, we could get smaller automaton from building them from > a derivative-base approach. Do you think that this method will generate > significantly smaller automaton? The proof of the pudding is in the eating ;) The comparisons in Turon's paper look encouraging, and Nate has had good experience with the derivative approach in Boomerang; ultimately, it's something we'll only find out by trying it. > Would it be possible to assign priority to lenses? The order in which > lenses are written could be their priority order. Wouldn't that implicitly turn 'l1|l2' into 'l1|(l2-l1)' ? The real problem is also in checking horizontal ambiguity, where priority won't help. The quickest way forward if probably a httpd lens that is a little too lax, i.e. that allows nesting of any section in any other section in httpd.conf, and refine that slowly; very similar to the lens attached to bug #100. David _______________________________________________ augeas-devel mailing list [email protected] https://www.redhat.com/mailman/listinfo/augeas-devel
