Re: EXPENSIVE response code

Mark Crispin Fri, 11 Jul 2003 21:39:55 -0700

On Fri, 11 Jul 2003, Cyrus Daboo wrote:
> Unfortunately I have of late been running more and more into the problem of
> users wanting to THREAD very large mailboxes and complaining when it takes
> a long time. They are not keen on restricting the input message set to e.g.
> just unseen or messages sent within a recent period of time.


Restricting the input message set isn't very useful with threading, since
it changes the thread output even if you end up filtering the output.  You
need to consider all the messages in order to get the relationships right.

> I would
> strongly urge server implementers to look beyond merely caching references
> headers, and to come up with smart caching of the actual thread tree
> structure itself.

I'm surprised that you're having a performance problem if you cache the
references.  At least in my server, the cost is in loading the sortcache
(which is a once-only hit) rather than in actually doing the threading
which seems to be more or less instantaneous.

Do you cache the references as a single string, or as a linked list of
message-ids?  I do the latter.  I also cache the extracted subject, date,
and message-id.

You may also want to check into your hashing algorithm, and make sure that
your hash table is large enough.  If you use a general hashing algorithm,
be sure that it allows you to make the hash buckets any size that you
want.  Each bucket needs the sortcache pointer, a parent pointer, a
next-sibling pointer, and a child pointer, for a total of four data words
per bucket.

Since none of the hash table packages worked the way I wanted, I wrote my
own.  Pretty basic, doing a polynomial of each byte of the key, with a
prime-sized hash table.  But it seems fast enough, and since it was my own
algorithm I was able to make sure that it had all the data I need.

Remember: "any algorithm is trivial if you have the right data
structures"!

-- Mark --

http://staff.washington.edu/mrc
Science does not emerge from voting, party politics, or public debate.
Si vis pacem, para bellum.

Re: EXPENSIVE response code

Reply via email to