Wow, ok. You got me. And that's a lot of broken headers in there, too! I wonder if we should write a tool to find those and try to analyze what went wrong with the message in question. Might be that we come up with entries that were made with broken versions of DBMail or GMime during 2006, in which case we should selectively re-cache those entries.
Aaron On Tue, 2007-06-19 at 09:20 +0200, Paul J Stevens wrote: > Aaron Stone wrote: > > I'm not convinced that there's much point to having another bridge table > > for the headers. We're not talking about very much data, and I'd be > > concerned that we would be killing off some useful indexes for fast header > > retrievals. > > Wrong on both counts: we are talking about a lot of data that can be accesses > a > lot faster if we can normalize the table. > > Currently the number of rows in headervalues is growing an order of magnitide > faster than the number of rows in the physmessage table. > > Just a brief check of my own data: > > mysql> select count(*),n.headername, headervalue from dbmail_headervalue v > join > dbmail_headername n on n.id=v.headername_id where v.headervalue = '"Aaron > Stone" > <[EMAIL PROTECTED]>' group by headername_id; > +----------+-------------+--------------------------------------+ > | count(*) | headername | headervalue | > +----------+-------------+--------------------------------------+ > | 4 | To | "Aaron Stone" <[EMAIL PROTECTED]> | > | 125 | From | "Aaron Stone" <[EMAIL PROTECTED]> | > | 1 | Resent-From | "Aaron Stone" <[EMAIL PROTECTED]> | > +----------+-------------+--------------------------------------+ > 3 rows in set (0.03 sec) > > Ok, so that's relevant, but not really dramatic. But there are worse > situations: > > mysql> select count(*),n.headername, headervalue from dbmail_headervalue v > join > dbmail_headername n on n.id=v.headername_id where > v.headervalue='[EMAIL PROTECTED]' group by headername_id; > +----------+------------------+---------------------------+ > | count(*) | headername | headervalue | > +----------+------------------+---------------------------+ > | 2090 | Return-Path | [EMAIL PROTECTED] | > | 1 | To | [EMAIL PROTECTED] | > | 1 | From | [EMAIL PROTECTED] | > | 13134 | Sender | [EMAIL PROTECTED] | > | 13118 | Errors-To | [EMAIL PROTECTED] | > | 5 | X-Invalid-Header | [EMAIL PROTECTED] | > | 4 | | [EMAIL PROTECTED] | > | 1 | nder | [EMAIL PROTECTED] | > | 3 | rs-To | [EMAIL PROTECTED] | > | 3 | s-To | [EMAIL PROTECTED] | > | 2 | o | [EMAIL PROTECTED] | > | 1 | -To | [EMAIL PROTECTED] | > | 3 | rrors-To | [EMAIL PROTECTED] | > | 1 | er | [EMAIL PROTECTED] | > | 1 | r | [EMAIL PROTECTED] | > | 1 | rors-To | [EMAIL PROTECTED] | > +----------+------------------+---------------------------+ > > > So that *is* rather painfull. So many rows with identical headervalues are > bound > to slow things considerably. Table scans do take time, even on indexed tables. > > I thought about it last night, and propose something like: > > create table dbmail_headervalue ( > id char(64) not null primary key, -- sha1 digest of > headervalue varchar(255) not null default '', > key headervalue (headervalue) > ); > > create table dbmail_headername ( > id bigint not null auto_increment, > headername varchar(100) not null default '', > key headername (headername) > ); > > create table dbmail_headerlists ( > physmessage_id bigint not null, > header_order int not null default 0, > headername_id bigint not null references > dbmail_headername (headername_id) > on update cascade on delete cascade, > headervalue_id char(64) not null references > dbmail_headervalue (headervalue_id) > on update cascade on delete cascade, > key physmessage_id (physmessage_id), > key physmessage_id (physmessage_id), > key headername_id (headername_id), > key headervalue_id (headervalue_id) > ); > > > I think this will actually improve not only storage requirements for the > headervalue table, but also access times on it. > > > > -- > ________________________________________________________________ > Paul Stevens paul at nfg.nl > NET FACILITIES GROUP GPG/PGP: 1024D/11F8CD31 > The Netherlands________________________________http://www.nfg.nl > _______________________________________________ > DBmail mailing list > [email protected] > https://mailman.fastxs.nl/mailman/listinfo/dbmail _______________________________________________ DBmail mailing list [email protected] https://mailman.fastxs.nl/mailman/listinfo/dbmail
