Re: [Dbmail-dev] Dangerous SQL games -- Is my optimization insane?

Paul J Stevens Tue, 24 Jul 2007 00:59:26 -0700


Hi Tim,


Tim Mattison wrote:

Hello all,
I've been running DBMail for years now and things have gone great.Recently, however, my server was downgraded due to hardware failure andhas just never been the same (went from 750GB RAID5 to 250GB RAID1, 4GBRAM to 1GB RAM, two HT processors to one Core2Duo). Obviously thatnailed performance so I've been struggling ever since.
A few days ago I got a new server that's a little more buff andperformance was significantly better. I moved from some dev subversionsnapshot of 2.2 to the latest stable snapshot (2.2.6-rc1) and decided tostop following the subversion releases for a while.

Fyi, some minor issues and one less than minor issue (bug #624) were fixed since2.2.6-rc1.

  And now, the meat of the post...
Performance, while better, is still nowhere near as good as I rememberit being. I've only got two users on this DBMail installation so Ican't understand why things are so slow. I noticed that the followingquery was running pretty much all the time (with different values in theSQL, of course):


<snip>

  I added functional indexes for headervalue and headername like this:
create index dbmail_headervalue_3 ondbmail_headervalue(lower(headervalue));

Heh? I see that one is missing from create_tables.pgsql !!! And that's not theonly one...


the whole list of obviously missing indexes for the postgres tables:

CREATE INDEX dbmail_headervalue_3 ON dbmail_headervalue(headervalue);
CREATE INDEX dbmail_subjectfield_2 ON dbmail_subjectfield(subjectfield);
CREATE INDEX dbmail_datefield_2 ON dbmail_datefield(datefield);
CREATE INDEX dbmail_referencesfield_2 ON 
dbmail_referencesfield(referencesfield);
CREATE INDEX dbmail_fromfield_2 ON dbmail_fromfield(fromaddr);
CREATE INDEX dbmail_fromfield_3 ON dbmail_fromfield(fromname);
CREATE INDEX dbmail_tofield_2 ON dbmail_tofield(toname);
CREATE INDEX dbmail_tofield_3 ON dbmail_tofield(toaddr);
CREATE INDEX dbmail_replytofield_2 ON dbmail_replytofield(replytoname);
CREATE INDEX dbmail_replytofield_3 ON dbmail_replytofield(replytoaddr);
CREATE INDEX dbmail_ccfield_2 ON dbmail_ccfield(ccname);
CREATE INDEX dbmail_ccfield_3 ON dbmail_ccfield(ccaddr);

for completion's sake also some missing mysql indexes:

CREATE INDEX dbmail_fromfield_1 ON dbmail_fromfield(fromname);
CREATE INDEX dbmail_fromfield_2 ON dbmail_fromfield(fromaddr);
CREATE INDEX dbmail_fromfield_1 ON dbmail_tofield(toname);
CREATE INDEX dbmail_fromfield_2 ON dbmail_tofield(toaddr);
CREATE INDEX dbmail_replytofield_1 ON dbmail_replytofield(replytoname);
CREATE INDEX dbmail_replytofield_2 ON dbmail_replytofield(replytoaddr);
CREATE INDEX dbmail_ccfield_1 ON dbmail_ccfield(ccname);
CREATE INDEX dbmail_ccfield_2 ON dbmail_ccfield(ccaddr);

create index dbmail_headername_2 on dbmail_headername(lower(headername));


That one was in the original create_tables.pgsql

  And rewrote the query like this:
SELECT message_idnr FROM dbmail_messages m JOIN dbmail_physmessage p ONm.physmessage_id=p.id JOIN dbmail_headervalue v ON v.physmessage_id=p.idJOIN dbmail_headername n ON v.headername_id=n.id WHERE mailbox_idnr =994 AND status IN (0,1) AND lower(headername) = lower('MESSAGE-ID') ANDlower(headervalue) = lower('<[EMAIL PROTECTED]>')ORDER BY message_idnr;


Which is not valid I'm afraid.

The performance increase is incredible. It cuts the computed costapproximately in half but cuts the actual execution time by severalorders of magnitude due to:
1) Exact comparisons always using the index (ILIKE and strings startingwith % are known to not use indexes for several reasons)
2) Exact comparisons don't require a "Filter" step

The problem is IMAP-SEARCH is not about exact comparisons, but about substringsearches. In your case the search command was something like:


x UID SEARCH 1:* (HEADER MESSAGE-ID "<[EMAIL PROTECTED]>")

in which case we might deduce from the combination of the header-name(message-id) and the format of the header-value <...>, that the search-stringimplies an exact comparison, but most of the time things are not that obvious:


x UID SEARCH 1:* (HEADER FROM "peter")

So we might introduce exact matching under some circumstances, but we'll have tobe very careful to weigh the performance benefits and the added complexity andmaintenance cost.

Now, the question is... is this safe to do? The system appears to bemuch more responsive after this but I'm afraid that the query I changed(in dbmail_mailbox.c lines 1250 to 1263) is used somewhere else in adifferent way that can cause weird side effects. Any ideas? I am usingthe system over IMAP if that matters.


With the change you did, imap-search on substrings will be broken.

Also, when trying to delete a large number of messages (~1K - ~50K)the system just refuses to delete them. They keep coming back over andover again. I'm using the Mac OSX mail client and moving messages worksgreat but deletions just haunt me. I don't really delete mail, I justmove the spam to the spam folder and save everything so I haven't hadtime to check to see whether single deletions work. Is anyone elsehaving the same issue? If it's already in the bug tracker I apologize,it's been a long time since I went there and honestly the performanceissues come first.

Please keep running those analyzers to find performance bogs in the SQL codeused. Aaron and I can only do so much. Help is much appreciated.



--
  ________________________________________________________________
  Paul Stevens                                      paul at nfg.nl
  NET FACILITIES GROUP                     GPG/PGP: 1024D/11F8CD31
  The Netherlands________________________________http://www.nfg.nl
_______________________________________________
Dbmail-dev mailing list
[email protected]
http://twister.fastxs.net/mailman/listinfo/dbmail-dev

Re: [Dbmail-dev] Dangerous SQL games -- Is my optimization insane?

Reply via email to