Hi all,
I don't think the 2.0->2.2 upgrade issue will be relevant to anybody
out there any more but I can't but have my last bits of fun out of the
problem by telling you about the elegant solution I found for it. :-)
DISCLAIMER: I assume no responsibility so ever for any consequences
from your trying to use this recipe on your system. YMMV.
Given:
A 300Gb Postgresql 8.0 DB with dbmail 2.0 tables along with some
inhouse data. >99% of the DB size was the mail data.
Objective:
Upgrade DBMail to 2.2 and Pg to 8.4 with the shortest downtime possible.
Procedures tested but proven unfeasible:
1. Upgrade Pg, test it, then upgrade DBMail. I.e., use Slony to
replicate the DB over to Pg 8.4 and switch over to it; check that all
is OK; next night run the DBMail schema upgrade script. It didn't
work because DBMail 2.0 didn't seem to be able to work with Pg 8.4.
IMAP would mysteriously fail on some mailboxes. Most likely, DBMail
2.0 depended on some assumptions about SQL semantics that were no
longer true in Pg 8.4. (AFAIK, this, or similar, issue had to be
specifically fixed in DBMail 2.2.)
2. Upgrade DBMail immediately after Pg. That is, use Slony to get the
DB running in Pg 8.4 and run the DBMail schema upgrade script just
after that. Alas, this turned out to result in a *very* long
downtime. The most time-consuming step would be header cache
initialisation, and DBMail 2.2 proved pretty much useless without the
header cache, e.g., IMAP would return blank values for basic message
attributes in the message list until they were cached. On a purpose
built platform and using a tuned version of the schema upgrade script,
we were able to reduce TEXT->BYTEA conversion time to ~2 hours for
that 300Gb database. (The trick was to run the conversion as ALTER
TABLE with an implicit cast rather than UPDATE.) However, it would
take up to 2 days for the header cache to be populated by dbmail-util -
by.
Solution that worked:
Technical background:
Fine point 1: The scope of a Slony replication instance is the list of
tables specified in its configuration. Slony never works beyond that
list. Thus it's OK to create new tables in the slave DB and they won't
interfere with the replication -- unless you try really hard to
deliberately break it.
Fine point 2: Unless / until dbmail-util -pd runs, physical messages,
once inserted in the dbmail store, are never modified or deleted.
IMAP and POP3 operations affect logical message flags and/or mailbox
linkage only. Thus it's possible to a) run TEXT->BYTEA conversion
into a new table (herein referred to as a shadow copy) to have the
bulk of the messages converted to the new storage format, and b) later
make just incremental updates to the shadow copy. Note well that
periodic runs of the old dbmail-util must be blocked during the upgrade.
Fine point 2a: Internal message IDs in the dbmail store are
monotonously incrementing so it's possible to efficiently do
incremental updates to shadow copies of the physical message tables.
Fine point 3: dbmail-util -by (populate caches) needs the physical
message tables only, physmessage & messageblks. It'll never try to
access a logical message table nor a mailbox table nor a user table.
Ditto for the is_header check implied by -by.
So the whole point is to do bulk actions (BYTEA, caches) in background
on a frozen shadow copy so that downtime is needed to just quickly
sync it with the original tables.
Solution outline:
- First of all, start continuous Slony replication of the original DB
to a slave copy on Pg 8.4. Wait until it's in sync. This is done in
background with no downtime.
- On the slave database, run initial TEXT->BYTEA conversion on the
messageblks into a new table, e.g., using CREATE TABLE
shadow_messageblks AS SELECT... Neither downtime nor performance
penalty is incurred here.
- Create a shadow copy of physmessage matching the shadow copy of
messageblks. I.e., it needs to contain all physmessages referenced by
the latter but no newer physmessages. Again, it's as easy as CREATE
TABLE shadow_physmessage AS SELECT ... WHERE id <= (SELECT
MAX(physmessage_id) FROM shadow_messageblks).
- Note well that the shadow messageblks and physmessage tables
initially are disconnected from the sequences.
- Now create the header cache tables and indexes using a name prefix
of shadow_ rather than dbmail_. So you'll be able to make use of a
custom table name prefix in dbmail 2.2.
- Set up a custom dbmail.conf for dbmail-util v2.2 to run with. The
DB server will be the new one (slave, Pg 8.4) and the table name
prefix will be shadow_ not dbmail_. The latter customisation is very
important because the dbmail_ tables still are from dbmail 2.0.
- Start dbmail-util -by v2.2 on the shadow_ tables and let it run
until completion. This can take a while but it'll run in background.
That will get you the bulk of header cache populated.
- Sync shadow_physmessage and shadow_messageblks with their originals,
e.g., using INSERT SELECT WHERE NOT EXISTS (or WHERE based on the IDs,
which can be faster.)
- Re-run dbmail-util -by v2.2 with the custom config file.
- BEGIN DOWNTIME
- Stop the dbmail daemons and possibly the MTA.
- Re-sync the shadow tables.
- Re-re-run dbmail-util -by.
- Stop and drop the Slony replication, thus promoting the slave DB to
a standalone DB.
- Rename shadow_ tables and indexes, perhaps having renamed away the
originals just in case. E.g., ALTER TABLE dbmail_physmessage NAME
old_dbmail20_physmessage; ALTER TABLE shadow_physmessage NAME
dbmail_physmessage. The old tables can be dropped when all is done
and tested.
- Connect the new physical layer tables to their respective old
sequences to preserve ID continuity: ALTER TABLE ... SET DEFAULT ...
- Finish the schema upgrade according to the stock script. Of course,
the steps already done will have to be left out from it, namely, the
TEXT->BYTEA conversion and the creation of the cache tables and their
indexes.
- Start the _new_ dbmail daemons and the MTA.
- END DOWNTIME
That's it, folks!
Note that this procedure doesn't cover the upgrade of dbmail binaries
themselves. Please exercise your Unix sysadmin skills to do it
properly.
Examples of the SQL code:
Initial copy:
BEGIN;
CREATE CAST (text AS bytea) WITHOUT FUNCTION;
DROP TABLE IF EXISTS shadow_messageblks;
CREATE TABLE shadow_messageblks AS SELECT
messageblk_idnr,
physmessage_id,
messageblk::BYTEA,
blocksize,
is_header
FROM dbmail_messageblks;
DROP CAST (text AS bytea);
-- Now that the initial copy is there, apply constraints to it.
-- This also provides us with an index on the primary id.
ALTER TABLE shadow_messageblks ALTER messageblk_idnr SET NOT NULL;
ALTER TABLE shadow_messageblks ALTER physmessage_id SET NOT NULL;
ALTER TABLE shadow_messageblks ALTER messageblk SET NOT NULL;
ALTER TABLE shadow_messageblks ALTER blocksize SET NOT NULL;
ALTER TABLE shadow_messageblks ALTER is_header SET NOT NULL;
ALTER TABLE shadow_messageblks ALTER physmessage_id SET DEFAULT
(0)::BIGINT;
ALTER TABLE shadow_messageblks ALTER blocksize SET DEFAULT (0)::BIGINT;
ALTER TABLE shadow_messageblks ALTER is_header SET DEFAULT
(0)::SMALLINT;
ALTER TABLE shadow_messageblks ADD PRIMARY KEY (messageblk_idnr);
CREATE INDEX shadow_messageblks_physmessage_is_header_idx ON
shadow_messageblks (physmessage_id, is_header);
COMMIT;
Sync shadow_messageblks with dbmail_messageblks:
-- A straightforward approach to synchronising the copy
-- with the original dbmail_messageblks table. It assumes
-- that existing messages are never modified, which is true
-- in DBMail unless dbmail-util ran.
BEGIN;
CREATE CAST (text AS bytea) WITHOUT FUNCTION;
INSERT INTO shadow_messageblks
(messageblk_idnr,physmessage_id,messageblk,blocksize,is_header)
SELECT
messageblk_idnr,
physmessage_id,
messageblk::BYTEA,
blocksize,
is_header
FROM dbmail_messageblks
WHERE NOT EXISTS (
SELECT messageblk_idnr
FROM shadow_messageblks
WHERE
shadow_messageblks.messageblk_idnr=dbmail_messageblks.messageblk_idnr
);
DROP CAST (text AS bytea);
COMMIT;
Note: SCHEMA and ROLE/USER magic can be used instead of just the shadow_
prefix to further protect the original tables in the replica DB. Note
that Pg can assign a user/role based on the client IP -- see
pg_hba.conf.
Thanks,
Yar
_______________________________________________
DBmail mailing list
[email protected]
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail