Thomas Mueller wrote:
Hi Geo,
sorry for the long delay, I've been very busy and had some problems with
the new clustering feature of the Debian package (some are still left).
I've installed a PostgreSQL 8.0.1 in a chroot, converted the 7.4.7
configs as far as possible, did a pg_dumpall of 7.4 and imported that in
8.0.
The first strange thing is: the data/ directory of 7.4 has nearly 8 gig,
the dump has 4 gig, the data/ directory of 8.0 has 2.8 gig but
everything seems to be there.
The dump being significantly smaller than the data directory doesn't
surprise me as the dump file doesn't use any space for indexes and
such. The fact that the 8.0 data dir is smaller that the dump file is a
little strange, but no unheard of the text of the messages is most
getting compressed when stored. The big thing here is that you appear
to have massive file and / or index bloat in your old 7.4.7 database.
This is mostly caused by not vacuuming frequently enough or not having
the FSM settings set high enough. I would be that if you did a vacuum
full on your the 7.4.7 database and then reindexed most of the tables,
then the 7.4 data directory would be fairly close in size to the 8.0.
Right. The difference is really impressive - to be honest that's not
what I've expected.
Query with PostgreSQL 7.4.7 (the details are in the older mails):
explain analyze SELECT seen_flag, answered_flag, deleted_flag,
flagged_flag, draft_flag, recent_flag, TO_CHAR(internal_date,
'YYYY-MM-DD HH24:MI:SS' ), rfcsize, message_idnr FROM dbmail_messages
msg, dbmail_physmessage pm WHERE pm.id = msg.physmessage_id AND
message_idnr BETWEEN '1' AND '500000' AND mailbox_idnr = '82' AND status
< '2' ORDER BY message_idnr ASC;
-> Sort (cost=21912.37..21921.46 rows=3637 width=36) (actual
time=1038.073..1040.845 rows=811 loops=1)
Took 7397.000, second 1094.611, third 1033.421, fourth 1044.284
Same query with PostgreSQL 8.0.1:
Sort (cost=562.03..562.03 rows=1 width=36) (actual
time=1345.991..1348.735 rows=811 loops=1)
Sort Key: msg.message_idnr
-> Nested Loop (cost=0.00..562.02 rows=1 width=36) (actual
time=231.209..1328.342 rows=811 loops=1)
-> Index Scan using dbmail_messages_tmm_status_recent_idx on
dbmail_messages msg (cost=0.00..555.99 rows Index Cond:
((mailbox_idnr = 82::bigint) AND (status < 2::smallint))
Filter: ((message_idnr >= 1::bigint) AND (message_idnr <=
500000::bigint))
-> Index Scan using physmessage_pkey on dbmail_physmessage pm
(cost=0.00..6.01 rows=1 width=24) (actual
Index Cond: (pm.id = "outer".physmessage_id)
Total runtime: 1351.952 ms
second 73.160, third 72.099, fourth 72.959, fifth 70.791
So the cached results are about 15 times faster than with 7.4.7 (!!!).
The 7.4.7 query is faster than it was before too. In the last weeks I
did a kernel security upgrade so I had to reboot - strange.
The real difference here as pointed out by someone else is that 8.0
appears to be using an index whereas 7.4 was doing a seqential scan over
the entire table. I'm not exactly sure why 7.5 was making that choice
but it might be due to bad stats (estimating 3637 rows when it only got
811) or it might be due to the fact that indexes can now be used between
different compatible datatyples automatically (int4 to int8 for
example). I'm guessing that your 7.4 performance problems were mostly
related to bloat and out of date statistics. I recommend you look into
setting up autovacuum on your 8.0 system and run a vacuum analyze from
cron nightly.
Matthew