I believe your choice is better, in that, it requires many less changes to the underlying db_ methods, so I'll back mine out and proceed with yours. I may need one new db_ method to add a user mailbox to an existing message.

I think there's a way to do this entirely in the the database layer, at least for postgresql. Mysql would have to wait for 5.0 I think.

Essentially, fingerprint each message block with a crypto hash, and store that as a unique key in a table. If there's duplicate on insert, the rule would just swallow the row and return the key. A view could be put together for the current messageblks table, but for most of the access, its not really a useful construct. Or maybe it is depending on how well pg pushes the query clauses down into the view definition.

I've got some sql to do this, tested only with dummy manual inserts, but it seems to work. It's more geared towards the 1.0 series layout, as I'm not using the physmessage table. It's also at least one delete rule away from being complete. It does pass the 'it works by hand' test, but I haven't subbed this under dbmail yet.

There are a few keys here:
        there is a copy of the message flags row for every message*recipients.
there is a copy of the mapping between the message_idnr and the blocks for every message*recipient.
        there is one copy of any identical block.

So this would support different header blocks attached to cc/bcc'd messages where the body is the same. It would support multiple independent delivery. It would support fully identical delivery as well.

eric


create table messageblks_new (
        fprint   varchar(32), -- I'd rather use binary here....
        blocksize int,
        messageblk text
);

create unique index msgblknew_fp on messageblks_new (fprint);

drop table message_blk_ref;
create table message_blk_ref(
        fprint   varchar(32),
        messageblk_idnr serial,
        message_idnr    bigint
);

create unique index msgblk_ref_pk on message_blk_ref (message_idnr, messageblk_idnr);
create index msgblk_ref_fp on message_blk_ref (fprint);

drop view messageblks;  
create view messageblks as
        select messageblk_idnr, message_idnr,  messageblk, blocksize
                from message_blk_ref inner join messageblks_new using (fprint);


create or replace function insertAndReturnFprint(text) returns text as '
declare
        strMessageblk alias for $1;
        res record;
        strHash text;
begin
        strHash := md5(strMessageblk);
for res in select count(*) from messageblks_new where messageblks_new.fprint=strHash loop
                if res.count > 0 then
                        return strHash;
                end if;
        end loop;
        insert into messageblks_new (fprint, blocksize, messageblk) values
                (strHash, char_length(strMessageblk), strMessageblk);
        return strHash;
end;
' language plpgsql;

drop rule ins_msgblks on messageblks;
create rule ins_msgblks as on insert to messageblks do instead
         insert into message_blk_ref (fprint, message_idnr) values
                ((select insertAndReturnFprint(new.messageblk)),
                  new.message_idnr);



insert into messageblks (messageblk, message_idnr) values ('bar', 3);
insert into messageblks (messageblk, message_idnr) values ('bong', 3);
insert into messageblks (messageblk, message_idnr) values ('baz', 3);
insert into messageblks (messageblk, message_idnr) values ('baz', 4);

This last insert will not add a row to the message block storage, only to the reference table.

Reply via email to