Aaron Stone wrote:
> I'm not convinced that there's much point to having another bridge table
> for the headers. We're not talking about very much data, and I'd be
> concerned that we would be killing off some useful indexes for fast header
> retrievals.

Wrong on both counts: we are talking about a lot of data that can be accesses a
lot faster if we can normalize the table.

Currently the number of rows in headervalues is growing an order of magnitide
faster than the number of rows in the physmessage table.

Just a brief check of my own data:

mysql> select count(*),n.headername, headervalue from dbmail_headervalue v join
dbmail_headername n on n.id=v.headername_id where v.headervalue = '"Aaron Stone"
<[EMAIL PROTECTED]>' group by headername_id;
+----------+-------------+--------------------------------------+
| count(*) | headername  | headervalue                          |
+----------+-------------+--------------------------------------+
|        4 | To          | "Aaron Stone" <[EMAIL PROTECTED]> |
|      125 | From        | "Aaron Stone" <[EMAIL PROTECTED]> |
|        1 | Resent-From | "Aaron Stone" <[EMAIL PROTECTED]> |
+----------+-------------+--------------------------------------+
3 rows in set (0.03 sec)

Ok, so that's relevant, but not really dramatic. But there are worse situations:

mysql> select count(*),n.headername, headervalue from dbmail_headervalue v join
dbmail_headername n on n.id=v.headername_id where
v.headervalue='[EMAIL PROTECTED]' group by headername_id;
+----------+------------------+---------------------------+
| count(*) | headername       | headervalue               |
+----------+------------------+---------------------------+
|     2090 | Return-Path      | [EMAIL PROTECTED] |
|        1 | To               | [EMAIL PROTECTED] |
|        1 | From             | [EMAIL PROTECTED] |
|    13134 | Sender           | [EMAIL PROTECTED] |
|    13118 | Errors-To        | [EMAIL PROTECTED] |
|        5 | X-Invalid-Header | [EMAIL PROTECTED] |
|        4 |                  | [EMAIL PROTECTED] |
|        1 | nder             | [EMAIL PROTECTED] |
|        3 | rs-To            | [EMAIL PROTECTED] |
|        3 | s-To             | [EMAIL PROTECTED] |
|        2 | o                | [EMAIL PROTECTED] |
|        1 | -To              | [EMAIL PROTECTED] |
|        3 | rrors-To         | [EMAIL PROTECTED] |
|        1 | er               | [EMAIL PROTECTED] |
|        1 | r                | [EMAIL PROTECTED] |
|        1 | rors-To          | [EMAIL PROTECTED] |
+----------+------------------+---------------------------+


So that *is* rather painfull. So many rows with identical headervalues are bound
to slow things considerably. Table scans do take time, even on indexed tables.

I thought about it last night, and propose something like:

create table dbmail_headervalue (
        id char(64) not null primary key, -- sha1 digest of
        headervalue varchar(255) not null default '',
        key headervalue (headervalue)
);

create table dbmail_headername (
        id bigint not null auto_increment,
        headername varchar(100) not null default '',
        key headername (headername)
);

create table dbmail_headerlists (
        physmessage_id bigint not null,
        header_order int not null default 0,
        headername_id bigint not null references
                dbmail_headername (headername_id)
                on update cascade on delete cascade,
        headervalue_id char(64) not null references
                dbmail_headervalue (headervalue_id)
                on update cascade on delete cascade,
        key physmessage_id (physmessage_id),
        key physmessage_id (physmessage_id),
        key headername_id (headername_id),
        key headervalue_id (headervalue_id)
);
        

I think this will actually improve not only storage requirements for the
headervalue table, but also access times on it.



-- 
  ________________________________________________________________
  Paul Stevens                                      paul at nfg.nl
  NET FACILITIES GROUP                     GPG/PGP: 1024D/11F8CD31
  The Netherlands________________________________http://www.nfg.nl
_______________________________________________
DBmail mailing list
[email protected]
https://mailman.fastxs.nl/mailman/listinfo/dbmail

Reply via email to