Hello all,
I've been running DBMail for years now and things have gone
great. Recently, however, my server was downgraded due to hardware
failure and has just never been the same (went from 750GB RAID5 to
250GB RAID1, 4GB RAM to 1GB RAM, two HT processors to one Core2Duo).
Obviously that nailed performance so I've been struggling ever since.
A few days ago I got a new server that's a little more buff and
performance was significantly better. I moved from some dev
subversion snapshot of 2.2 to the latest stable snapshot (2.2.6-rc1)
and decided to stop following the subversion releases for a while.
And now, the meat of the post...
Performance, while better, is still nowhere near as good as I
remember it being. I've only got two users on this DBMail
installation so I can't understand why things are so slow. I noticed
that the following query was running pretty much all the time (with
different values in the SQL, of course):
SELECT message_idnr FROM dbmail_messages m JOIN dbmail_physmessage p
ON m.physmessage_id=p.id JOIN dbmail_headervalue v ON
v.physmessage_id=p.id JOIN dbmail_headername n ON
v.headername_id=n.id WHERE mailbox_idnr = 994 AND status IN (0,1) AND
headername ILIKE 'MESSAGE-ID' AND headervalue ILIKE '%
<[EMAIL PROTECTED]>%' ORDER BY message_idnr;
This gives me the execution plan of:
Sort (cost=45712.28..45712.29 rows=1 width=8) (actual
time=2398.829..2398.829 rows=0 loops=1)
Sort Key: m.message_idnr
-> Nested Loop (cost=0.00..45712.27 rows=1 width=8) (actual
time=2398.773..2398.773 rows=0 loops=1)
-> Nested Loop (cost=0.00..45709.24 rows=1 width=16)
(actual time=2398.770..2398.770 rows=0 loops=1)
-> Nested Loop (cost=0.00..45704.19 rows=1
width=24) (actual time=586.803..2398.698 rows=1 loops=1)
-> Seq Scan on dbmail_headervalue v
(cost=0.00..45698.28 rows=1 width=16) (actual time=586.729..2398.620
rows=1 loops=1)
Filter: (headervalue ~~* '%
<[EMAIL PROTECTED]>%'::text)
-> Index Scan using dbmail_physmessage_pkey on
dbmail_physmessage p (cost=0.00..5.91 rows=1 width=8) (actual
time=0.053..0.055 rows=1 loops=1)
Index Cond: ("outer".physmessage_id = p.id)
-> Index Scan using dbmail_messages_2 on
dbmail_messages m (cost=0.00..5.04 rows=1 width=16) (actual
time=0.068..0.068 rows=0 loops=1)
Index Cond: ("outer".physmessage_id =
m.physmessage_id)
Filter: ((mailbox_idnr = 994) AND ((status = 0)
OR (status = 1)))
-> Index Scan using dbmail_headername_pkey on
dbmail_headername n (cost=0.00..3.02 rows=1 width=8) (never executed)
Index Cond: ("outer".headername_id = n.id)
Filter: ((headername)::text ~~* 'MESSAGE-ID'::text)
Total runtime: 2398.974 ms
That's quite a long runtime so I started looking into the database
and noticed the following things:
1) There is no index on dbmail_headervalue(headervalue)
2) dbmail_headervalue(headervalue) is, as far as I can tell, exactly
equal to the search string
3) dbmail_headername(headername) is, as far as I can tell, exactly
equal to the search string
I added functional indexes for headervalue and headername like this:
create index dbmail_headervalue_3 on dbmail_headervalue(lower
(headervalue));
create index dbmail_headername_2 on dbmail_headername(lower
(headername));
And rewrote the query like this:
SELECT message_idnr FROM dbmail_messages m JOIN dbmail_physmessage p
ON m.physmessage_id=p.id JOIN dbmail_headervalue v ON
v.physmessage_id=p.id JOIN dbmail_headername n ON
v.headername_id=n.id WHERE mailbox_idnr = 994 AND status IN (0,1) AND
lower(headername) = lower('MESSAGE-ID') AND lower(headervalue) = lower
('<[EMAIL PROTECTED]>') ORDER BY message_idnr;
Which gives me an execution plan like this:
Sort (cost=25607.35..25607.35 rows=1 width=8) (actual
time=0.406..0.406 rows=0 loops=1)
Sort Key: m.message_idnr
-> Nested Loop (cost=22.86..25607.34 rows=1 width=8) (actual
time=0.400..0.400 rows=0 loops=1)
-> Nested Loop (cost=22.86..25601.42 rows=1 width=24)
(actual time=0.398..0.398 rows=0 loops=1)
-> Hash Join (cost=22.86..25414.56 rows=37 width=8)
(actual time=0.331..0.334 rows=1 loops=1)
Hash Cond: ("outer".headername_id = "inner".id)
-> Index Scan using dbmail_headervalue_3 on
dbmail_headervalue v (cost=0.00..25358.81 rows=6506 width=16)
(actual time=0.051..0.053 rows=1 loops=1)
Index Cond: (lower(headervalue) =
'<[EMAIL PROTECTED]>'::text)
-> Hash (cost=22.84..22.84 rows=7 width=8)
(actual time=0.058..0.058 rows=0 loops=1)
-> Index Scan using dbmail_headername_2
on dbmail_headername n (cost=0.00..22.84 rows=7 width=8) (actual
time=0.042..0.053 rows=4 loops=1)
Index Cond: (lower
((headername)::text) = 'message-id'::text)
-> Index Scan using dbmail_messages_2 on
dbmail_messages m (cost=0.00..5.04 rows=1 width=16) (actual
time=0.060..0.060 rows=0 loops=1)
Index Cond: ("outer".physmessage_id =
m.physmessage_id)
Filter: ((mailbox_idnr = 994) AND ((status = 0)
OR (status = 1)))
-> Index Scan using dbmail_physmessage_pkey on
dbmail_physmessage p (cost=0.00..5.91 rows=1 width=8) (never executed)
Index Cond: ("outer".physmessage_id = p.id)
Total runtime: 0.511 ms
The performance increase is incredible. It cuts the computed cost
approximately in half but cuts the actual execution time by several
orders of magnitude due to:
1) Exact comparisons always using the index (ILIKE and strings
starting with % are known to not use indexes for several reasons)
2) Exact comparisons don't require a "Filter" step
Now, the question is... is this safe to do? The system appears to
be much more responsive after this but I'm afraid that the query I
changed (in dbmail_mailbox.c lines 1250 to 1263) is used somewhere
else in a different way that can cause weird side effects. Any
ideas? I am using the system over IMAP if that matters.
Also, when trying to delete a large number of messages (~1K -
~50K) the system just refuses to delete them. They keep coming back
over and over again. I'm using the Mac OSX mail client and moving
messages works great but deletions just haunt me. I don't really
delete mail, I just move the spam to the spam folder and save
everything so I haven't had time to check to see whether single
deletions work. Is anyone else having the same issue? If it's
already in the bug tracker I apologize, it's been a long time since I
went there and honestly the performance issues come first.
Thanks for reading my epic novel. Hopefully someone can let me
know if I'm on the right track or just a loon.
Thanks,
Tim Mattison
[EMAIL PROTECTED]
_______________________________________________
Dbmail-dev mailing list
[email protected]
http://twister.fastxs.net/mailman/listinfo/dbmail-dev