Thomas Mueller wrote:

Hi Geo,

sorry for the long delay, I've been very busy and had some problems with
the new clustering feature of the Debian package (some are still left).

I've installed a PostgreSQL 8.0.1 in a chroot, converted the 7.4.7
configs as far as possible, did a pg_dumpall of 7.4 and imported that in
8.0.

The first strange thing is: the data/ directory of 7.4 has nearly 8 gig,
the dump has 4 gig, the data/ directory of 8.0 has 2.8 gig but
everything seems to be there.

The dump being significantly smaller than the data directory doesn't surprise me as the dump file doesn't use any space for indexes and such. The fact that the 8.0 data dir is smaller that the dump file is a little strange, but no unheard of the text of the messages is most getting compressed when stored. The big thing here is that you appear to have massive file and / or index bloat in your old 7.4.7 database. This is mostly caused by not vacuuming frequently enough or not having the FSM settings set high enough. I would be that if you did a vacuum full on your the 7.4.7 database and then reindexed most of the tables, then the 7.4 data directory would be fairly close in size to the 8.0.


Right. The difference is really impressive - to be honest that's not
what I've expected.


Query with PostgreSQL 7.4.7 (the details are in the older mails):
explain analyze SELECT seen_flag, answered_flag, deleted_flag,
flagged_flag, draft_flag, recent_flag, TO_CHAR(internal_date,
'YYYY-MM-DD HH24:MI:SS' ), rfcsize, message_idnr FROM dbmail_messages
msg, dbmail_physmessage pm WHERE pm.id = msg.physmessage_id AND
message_idnr BETWEEN '1' AND '500000' AND mailbox_idnr = '82' AND status
< '2' ORDER BY message_idnr ASC;
-> Sort  (cost=21912.37..21921.46 rows=3637 width=36) (actual
time=1038.073..1040.845 rows=811 loops=1)

Took 7397.000, second 1094.611, third 1033.421, fourth 1044.284


Same query with PostgreSQL 8.0.1:
Sort  (cost=562.03..562.03 rows=1 width=36) (actual
time=1345.991..1348.735 rows=811 loops=1)
  Sort Key: msg.message_idnr
  ->  Nested Loop  (cost=0.00..562.02 rows=1 width=36) (actual
time=231.209..1328.342 rows=811 loops=1)
        ->  Index Scan using dbmail_messages_tmm_status_recent_idx on
dbmail_messages msg  (cost=0.00..555.99 rows               Index Cond:
((mailbox_idnr = 82::bigint) AND (status < 2::smallint))
              Filter: ((message_idnr >= 1::bigint) AND (message_idnr <=
500000::bigint))
        ->  Index Scan using physmessage_pkey on dbmail_physmessage pm
(cost=0.00..6.01 rows=1 width=24) (actual
              Index Cond: (pm.id = "outer".physmessage_id)
Total runtime: 1351.952 ms

second 73.160, third 72.099, fourth 72.959, fifth 70.791

So the cached results are about 15 times faster than with 7.4.7 (!!!).

The 7.4.7 query is faster than it was before too. In the last weeks I
did a kernel security upgrade so I had to reboot - strange.


The real difference here as pointed out by someone else is that 8.0 appears to be using an index whereas 7.4 was doing a seqential scan over the entire table. I'm not exactly sure why 7.5 was making that choice but it might be due to bad stats (estimating 3637 rows when it only got 811) or it might be due to the fact that indexes can now be used between different compatible datatyples automatically (int4 to int8 for example). I'm guessing that your 7.4 performance problems were mostly related to bloat and out of date statistics. I recommend you look into setting up autovacuum on your 8.0 system and run a vacuum analyze from cron nightly.

Matthew

Reply via email to