On Thu, Sep 24, 2015 at 2:27 AM, Dominic Cleal <[email protected]> wrote:
> On 22/09/15 20:18, Laine Stump wrote: > > > > 1) has anyone thought about/looked into optimizing/changing the data > > structure used to store nodes in augeas to scale better with larger > > datasets (execution time seems to increase at > linear)? > > Yes, I've seen something similar before - it was reported to us in the > context of a Puppet provider working on a huge file with many Nagios > service definitions. When lots of nodes with the same name, but > different index (e.g. service[1], service[2]) exist then Augeas is > extremely slow to traverse paths with a high index value. > That's not a huge surprise, as the data structure for a tree is incredibly simple: children are kept in a singly-linked list, so dealing with nodes that have lots of children is bound to be slow. > I spent a while profiling it and found a couple of very inefficient > memory operations - here's my branch: > > https://github.com/hercules-team/augeas/compare/master...domcleal:ns-filter-perf3 > Interesting .. that looks like it would get rid of some of the inefficiencies in dealing with nodes with a large number of children. Is the main issue to address expressions of the form 'service[%d]', i.e. addressing nodes by their position ? I wonder if it wouldn't be worth to just special-case that in ns_filter - we know that at most one node can match such a predicate and we could save ourselves a lot of effort by treating that specially. Other things, like "service/foo[. = 'bar']" will be much harder to get speed up since we still need to go over a large number of service nodes ... David
_______________________________________________ augeas-devel mailing list [email protected] https://www.redhat.com/mailman/listinfo/augeas-devel
