On Thu, Sep 24, 2015 at 2:27 AM, Dominic Cleal <[email protected]> wrote:

> On 22/09/15 20:18, Laine Stump wrote:
> >
> > 1) has anyone thought about/looked into optimizing/changing the data
> > structure used to store nodes in augeas to scale better with larger
> > datasets (execution time seems to increase at > linear)?
>
> Yes, I've seen something similar before - it was reported to us in the
> context of a Puppet provider working on a huge file with many Nagios
> service definitions.  When lots of nodes with the same name, but
> different index (e.g. service[1], service[2]) exist then Augeas is
> extremely slow to traverse paths with a high index value.
>

That's not a huge surprise, as the data structure for a tree is incredibly
simple: children are kept in a singly-linked list, so dealing with nodes
that have lots of children is bound to be slow.


> I spent a while profiling it and found a couple of very inefficient
> memory operations - here's my branch:
>
> https://github.com/hercules-team/augeas/compare/master...domcleal:ns-filter-perf3
>

Interesting .. that looks like it would get rid of some of the
inefficiencies in dealing with nodes with a large number of children. Is
the main issue to address expressions of the form 'service[%d]', i.e.
addressing nodes by their position ? I wonder if it wouldn't be worth to
just special-case that in ns_filter - we know that at most one node can
match such a predicate and we could save ourselves a lot of effort by
treating that specially. Other things, like "service/foo[. = 'bar']" will
be much harder to get speed up since we still need to go over a large
number of service nodes ...

David
_______________________________________________
augeas-devel mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/augeas-devel

Reply via email to