Re: [Dbmail-dev] STORE is SLOOOOOW on folder with many, many messages

Geo Carncross Wed, 14 Dec 2005 22:17:50 +0100 (CET)

I wrote this query back in <[EMAIL PROTECTED]>

I also suggested the alternative:


select count(*), count(case when seen_flag > 0 Then 1 else null end),
count(case when recent_flag > 0 then 1 else null end) FROM
dbmail_messages WHERE status < '2' and mailbox_idnr='1';

in <[EMAIL PROTECTED]>

but I cannot recall if I ever profiled that with MySQL.

In SQLite, it generates a plan where the database isn't touched (index
only).


On Mon, 2005-12-12 at 10:03 -0800, Kevin Brown wrote:
> In experimenting with DBMail 2.0.7 (even with the changes I submitted
> regarding the headername query, I can't get 2.1-trunk to run well
> enough to work properly with both Mutt and Thunderbird), I discovered
> (as perhaps some of you have) that storing changes to the attributes
> is very slow when the folder has lots of messages (tens of thousands).
> 
> 2.1.x may have fixed this, though the query in question is the same.
> But in 2.0.7, the following query is responsible for the bulk of the
> time spent in the update:
> 
>     SELECT 'a',COUNT(*) FROM dbmail_messages WHERE mailbox_idnr='14'
>       AND (status='0' OR status='1') UNION SELECT 'b',COUNT(*) FROM
>       dbmail_messages WHERE mailbox_idnr='14' AND (status='0' OR
>       status='1') AND seen_flag=1 UNION SELECT 'c',COUNT(*) FROM
>       dbmail_messages WHERE mailbox_idnr='14' AND (status='0' OR
>       status='1') AND recent_flag=1
> 
> (substitute any given mailbox id for '14', of course).
> 
> 
> This query originates from this bit of code here in function
> db_getmailbox in db.c:
> 
>     snprintf(query, DEF_QUERYSIZE,
>       "SELECT 'a',COUNT(*) FROM dbmail_messages WHERE mailbox_idnr='%llu' "
>       "AND (status='%d' OR status='%d') UNION "
>       "SELECT 'b',COUNT(*) FROM dbmail_messages WHERE mailbox_idnr='%llu' "
>       "AND (status='%d' OR status='%d') AND seen_flag=1 UNION "
>       "SELECT 'c',COUNT(*) FROM dbmail_messages WHERE mailbox_idnr='%llu' "
>       "AND (status='%d' OR status='%d') AND recent_flag=1", 
>       mb->uid, MESSAGE_STATUS_NEW, MESSAGE_STATUS_SEEN,
>       mb->uid, MESSAGE_STATUS_NEW, MESSAGE_STATUS_SEEN,
>       mb->uid, MESSAGE_STATUS_NEW, MESSAGE_STATUS_SEEN);
>             
> 
> 
> That query is horribly inefficient on PostgreSQL.  The following is
> much more efficient (by at least a factor of two if not more) and
> accomplishes the same thing, though it will require a bit more
> interpretation by our code to use:
> 
>     SELECT seen_flag*2 + recent_flag, count(*) FROM dbmail_messages
>       WHERE mailbox_idnr = '14' AND status IN ('0', '1') GROUP BY
>       seen_flag*2 + recent_flag;
> 
> We can generate it thusly:
> 
>     snprintf(query, DEF_QUERYSIZE,
>       "SELECT seen_flag*2 + recent_flag, count(*) FROM "
>       "dbmail_messages WHERE mailbox_idnr = '%d' AND status IN "
>       "('%d', '%d') GROUP BY seen_flag*2 + recent_flag",
>       mb->uid, MESSAGE_STATUS_NEW, MESSAGE_STATUS_SEEN);
> 
> The query will return up to 4 rows with the following values as the
> first column:
> 
>     0: neither seen_flag nor recent_flag is set.
>     1: recent_flag is set, seen_flag is not.
>     2: seen_flag is set, recent_flag is not.
>     3: both seen_flag and recent_flag are set
> 
> 
> The additional interpretation is in reading and interpreting the
> results.  The query won't return rows for flag combinations that don't
> have any corresponding entries, so we have to infer those.  I think
> the most efficient way to do that is to create a simple 4-element
> integer array, initialize its elements to zero, and use the first
> column results as the index to that array.  Place the results in the
> array.  The array will have zero for entries that weren't returned by
> the query.  The array definition will be something like:
> 
>     unsigned int results[4];
> 
> 
> The values we're interested in (exists, seen, recent) will then be:
> 
>     exists = results[0] + results[1] + results[2] + results[3];
>     seen = results[2] + results[3];
>     recent = results[1] + results[3]
> 
> 
> 
> Now, the above is more efficient, but the real win would be if we
> could avoid calling db_getmailbox (or perhaps a modified version of it
> that doesn't have to do any of this) altogether after doing a STORE.
> 
> 
> I don't know how much more (if any) efficient the above will be on
> MySQL or SQLite, but I wouldn't expect it to be less efficient.
> 
> 
> Thoughts?
> 
> 
> 

-- 
Internet Connection High Quality Web Hosting
http://www.internetconnection.net/

Re: [Dbmail-dev] STORE is SLOOOOOW on folder with many, many messages

Reply via email to