Great analysis.
GList is the most absolutely rudimentary list structure you could
imagine. g_list_append walks the list to the end, then adds the new
element. Totally O(n^2). A bit of googling turned up some very defensive
arguments against changing GList to be a more opaque type with efficient
head and tail pointers and length information.
We have several calls to g_list_length in the codebase, too. These walk
all elements to find the length. Yuck.
I think we should use g_list* as the internals to our own opaque list
type. Given that most messages have only 15-20 headers, a hash table is
probably not going to be more efficient due to its overhead -- it would
have better worst-case performance, though.
Aaron
On Mon, 2006-03-06 at 22:00 -0600, Matthew Sayler wrote:
> > I compiled dbmail and glib with profiling and gprof shows almost all
> > time being spent in IA__g_list_last. (Note the CPU time reported by
> > time was 277.17s -- the rest is probably in glibc and mysql libs..)
> >
> > time seconds seconds calls s/call s/call name
> > 99.35 272.99 272.99 344704 0.00 0.00 IA__g_list_last
> > 0.20 273.54 0.55 18702183 0.00 0.00 ucmp
> > 0.10 273.82 0.28 344681 0.00 0.00 g_tree_node_insert
> > 0.06 273.99 0.17 860545 0.00 0.00 g_tree_node_lookup
> > 0.05 274.12 0.13 344771 0.00 0.00 _g_list_alloc
> > 0.03 274.20 0.08 14 0.01 0.01 db_getmailbox
> > (full profile log attached)
>
> OK -- O(n^2) behavior is bad. The traverse_tree_{keys, values, merge}
> functions all ended up appending message pointers to the end of a linked
> list. When you do this for 80,000 messages, you end up following about
> 3.2 billion pointers.. Yuck.
>
> Attached patch created copies of these functions (traverse_tree_keys_rev
> etc) which prepends list items and then uses a g_list_reverse to sort
> out the mess. It seems to work for me but I've given it minimal
> testing.
>
> There are better ways of doing this, but for me this reduces the above
> mailbox query from 272cpusec to 5.1cpusec, for a tidy 50x speedup.
>
> Matt
>
> _______________________________________________
> Dbmail-dev mailing list
> [email protected]
> http://twister.fastxs.net/mailman/listinfo/dbmail-dev
>