On Sat, 2008-07-26 at 12:52 -0400, Nate Foster wrote: > > I am pretty confident that switching from taking the union of a lot of > > lenses to a union of strings/regexps will take care of the slowness. The > > difference between 'l(r1) | l(r2) | .. | l(rn)' vs 'l(r1|r2|..|rn)' is > > enormous in terms of the internal processing - when parsing the file, > > the first version requires n regexp matches, whereas the second just > > requires one, plus the regexps for the first version are _much_ bigger > > than for the second. > > A trick we do in Boomerang, which may be useful if you really do need > a lens union and can't push it down into a union of regexps, is to > parse > > (l1 | l2 | l3 | l4) > > as > > ( ( l1 | l2 ) | ( l3 | l4 ) ) > > instead of > > ( l1 | ( l2 | ( l3 | l4 ) ) )
I actually represent the union lens now as an array of sublenses, and I could find out which branch to use with one regexp match (the glibc matcher tells me which group matched); I just haven't implemented that. Either way though, you wind up with much bigger regexps for the union of lenses than for the lens of a regexp union; I suspect that that's the real reason why things slow down - because the regexp matcher allocates enormous data structures. David _______________________________________________ augeas-devel mailing list [email protected] https://www.redhat.com/mailman/listinfo/augeas-devel
