Do you have any way of narrowing this down to specific queries that are taking
the longest and/or are being executed the most? That would identify which
low-level database functions are being called, then we can just trace our way
up the call chain to see who's misbehaving or acting on a flawed design. Also,
if you could run similar tests against the latest 1.2, it would help to give a
frame of reference, particularly for my delivery chain design.

Aaron


Thomas Mueller <[EMAIL PROTECTED]> said:

> Hi,
> 
> I did some tests with cvs head from today and found out that it is quite
> slow when messages are inserted. I have a really aged machine for tests,
> but even for that machine it is too slow:
> 
> Pentium 166, 128 MB RAM
> Debian Sarge
> Vanilla kernel 2.6.1
> all services stopped that aren't required
> IMAP protected using stunnel
> Postgres 7.3.4 on a fast 7200 rpm 80 GB HD
> 
> Copying about 3000 mails using IMAP took more than an hour (!) in my
> LAN.
> 
> I've enabled postgres stats collector and did some tests: I have set up
> a new database with one user, and two mailboxes, INBOX and INBOX/Test
> and renamed dbmail-smtp, so my MTA couldn't insert any messages.
> 
> I used the following SQL queries to get the stats:
> 
> select xact_commit from pg_stat_database where datname='dbmail'
> To get the overall number of transactions.
> 
> select * from pg_stat_user_tables
> To get detailed seq scans/INSERT/UPDATE/DELETE counts for every row.
> 
> select pg_stat_reset()
> To reset all stats to zero.
> 
> I've reset the stats, started Evolution 1.4.5 and copied 2872 small
> mails (sometimes spam can be useful) to my dbmail test machine.
> Unfortunately Evolution read about 2% of the mails after I've copied
> them, that's why some seq_scan numbers might be somewhat to high.
> I've attached the table because of the line breaks too.
> 
> --------------------------------------------------
>  xact_commit 
> -------------
>        66906
> (1 row)
> 
>  relid | schemaname |       relname        | seq_scan | seq_tup_read |
> idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del 
>
-------+------------+----------------------+----------+--------------+----------+---------------+-----------+-----------+-----------
>  83072 | public     | aliases              |        0 |            0 |
> 0 |             0 |         0 |         0 |         0
>  83082 | public     | users                |        0 |            0 |
> 11490 |         11490 |         0 |      5745 |         0
>  83095 | public     | mailboxes            |    20295 |        98603 |
> 0 |             0 |         0 |         0 |         0
>  83116 | public     | subscription         |        0 |            0 |
> 0 |             0 |         0 |         0 |         0
>  83128 | public     | acl                  |        0 |            0 |
> 0 |             0 |         0 |         0 |         0
>  83151 | public     | physmessage          |    17490 |     25494744 |
> 0 |             0 |      2872 |      2958 |         0
>  83160 | public     | messages             |    23436 |     34300164 |
> 0 |             0 |      2872 |      2872 |         0
>  83189 | public     | messageblks          |       86 |       493984 |
> 0 |             0 |      5744 |         0 |         0
>  83206 | public     | auto_notifications   |        0 |            0 |
> 0 |             0 |         0 |         0 |         0
>  83219 | public     | auto_replies         |        0 |            0 |
> 0 |             0 |         0 |         0 |         0
> [..]
> (15 rows)
> 
>  pg_stat_reset 
> ---------------
>  t
> (1 row)
> --------------------------------------------------
> 
> There are several problems:
> - 23 transactions for every message copied (!)
> - the users table is updated twice for every message
> - lot of sequential scans
> 
> We had that discussion some months ago yet - we really, really need
> transactions. I didn't check the code, but is it possible to encapsulate
> the database access for one mail into one transaction without major
> rewrites? That would speed up a lot.
> 
> 
> Then I checked the performance of dbmail-maintenance (cfpd).
> 
> There is one seqScan on mailboxes for every message even if there is
> nothing to do - several thousand mailboxes and some million mails and
> that will take a really long time.
> 
> I've marked all messages as deleted and ran maintenance the first time
> (set delete status for deleted messages) - 2872 updates on messages but
> only 31 transactions, that looks very good.
> The second run of maintenance (delete messages with delete status)
> looked worse: 2872 delete on physmessage and messages, 5744 delete on
> messageblks but 14389 transactions.
> 
> 
> It should be easy to add an index for mailboxes to remove the sequential
> scans, but I don't know what query forces postgres to do the seqScan.
> 
> I checked the code and found this query for example:
> snprintf(query, DEF_QUERYSIZE,
>          "SELECT mailbox_idnr FROM mailboxes "
>          "WHERE LOWER(name) = LOWER('%s') "
>          "AND owner_idnr='%llu'", name, owner_idnr);
> We have this index:
> CREATE INDEX mailboxes_name_idx ON mailboxes(name);
> But we would need:
> CREATE INDEX mailboxes_lower_name_idx ON mailboxes(lower(name));
> 
> Is IMAP case sensitive? Are folders like INBOX/test and INBOX/Test
> possible? If it's insensitive we could lower() all queries on names and
> create an index on lower(name).
> 
> 
> -- 
> MfG Thomas Mueller - http://www.tmueller.com for pgp key (95702B3B)
> 
> 



-- 



Reply via email to