On Dienstag, 19. Juni 2007 Paul J Stevens wrote:
> Currently the number of rows in headervalues is growing an order of
> magnitide faster than the number of rows in the physmessage table.
> Just a brief check of my own data:

I did a check also with this query:
select count(*),n.headername, headervalue from dbmail_headervalue v 
 join dbmail_headername n on n.id=v.headername_id 
 group by headername_id,n.headername,v.headervalue 
 order by count(*) desc;

This query took 802 seconds(!):
Jun 19 13:38:23 db.zmi.at postgres[9364]: [6-1] 2007-06-19 13:38:23 CEST 
DB=testdbmail HOST=[local] SESSTRT=2007-06-19 12:57:22 CEST LOG:  Dauer: 
802768.318 ms  Anweisung: select
Jun 19 13:38:23 db.zmi.at postgres[9364]: [6-2]  count(*),n.headername, 
headervalue from dbmail_headervalue v join dbmail_headername n on 
n.id=v.headername_id group by
Jun 19 13:38:23 db.zmi.at postgres[9364]: [6-3]  
headername_id,n.headername,v.headervalue order by count(*) desc;

(I've cut out some lines):

88599 | X-Sieve                                                         | CMU 
Sieve 2.2
 70106 | X-Virus-Scanned                                                 | 
amavisd-new at goelsen.net
 68771 | X-Virus-Scanned                                                 | 
amavisd-new at zmi.at
 62133 | X-Virus-Scanned                                                 | by 
amavisd-new at datamatix.at
 49643 | Precedence                                                      | bulk
 24697 | X-Virus-Checked                                                 | 
Checked by ClamAV on apache.org
 24537 | X-Spam-Check-By                                                 | 
apache.org
 23075 | X-MSMail-Priority                                               | 
Normal
 19532 | X-Priority                                                      | 3 
(Normal)
 19261 | Mailing-List                                                    | 
contact [EMAIL PROTECTED]; run by ezmlm
 19259 | List-Post                                                       | 
<mailto:[EMAIL PROTECTED]>
 19258 | Delivered-To                                                    | 
mailing list [EMAIL PROTECTED]
 19258 | list-help                                                       | 
<mailto:[EMAIL PROTECTED]>
 19258 | List-Id                                                         | 
<users.spamassassin.apache.org>
 19258 | list-unsubscribe                                                | 
<mailto:[EMAIL PROTECTED]>
 19110 | Delivered-To                                                    | 
[EMAIL PROTECTED]
 19110 | Sender                                                          | 
[EMAIL PROTECTED]
  4874 | List-Archive                                                    | 
<http://sourceforge.net/mailarchive/forum.php?forum=egroupware-german>
  4873 | List-Unsubscribe                                                | 
<https://lists.sourceforge.net/lists/listinfo/egroupware-german>,    
<mailto:[EMAIL PROTECTED]
  4873 | List-Subscribe                                                  | 
<https://lists.sourceforge.net/lists/listinfo/egroupware-german>,    
<mailto:[EMAIL PROTECTED]
  2473 | List-Subscribe                                                  | 
<http://www.links2linux.de/cgi-bin/mailman/listinfo/packman>,        
<mailto:[EMAIL PROTECTED]
  2466 | List-Unsubscribe                                                | 
<http://www.links2linux.de/cgi-bin/mailman/listinfo/packman>,        
<mailto:[EMAIL PROTECTED]
 2456 | List-Unsubscribe                                                | 
<http://links2linux.de/cgi-bin/mailman/listinfo/packman-adm>,        
<mailto:[EMAIL PROTECTED]
  2456 | List-Subscribe                                                  | 
<http://links2linux.de/cgi-bin/mailman/listinfo/packman-adm>,        
<mailto:[EMAIL PROTECTED]

You can see that especially for mailing lists,
compressing headers can be *very* space saving (those
lines are not even finished on the right side..).
Plus, I've got 160.000 headervalues which are empty strings.

select count(*) from dbmail_messages;
 121212
select count(*) from dbmail_physmessage;
 156133
select count(*) from dbmail_headername;
  2386
select count(*) from dbmail_headervalue;
 4551894

So 4,5 million values for 121k messages with only 2400 
header names. Amazeing.
BTW, Is it OK to have more physmessages than messages?

> create table dbmail_headervalue (
>       id char(64) not null primary key, -- sha1 digest of
>       headervalue varchar(255) not null default '',

Can't the headervalue be much longer than 255? Currently we have
headervalue text DEFAULT ''::text NOT NULL

Received: from mailsrv.zmi.at (mailsrv1.zmi.at [212.69.162.198])
        (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
        (Client CN "mailsrv.zmi.at", Issuer "power4u.zmi.at" (not verified))
        by power2u.goelsen.net (Postfix) with ESMTP id 0E7F677A43
        for <[EMAIL PROTECTED]>; Tue, 19 Jun 2007 09:49:21 +0200 (CEST)

I did a quick look, they are stored in one entry, so probably this field should 
be
varchar(30000) or something. Anybody got the correct RFC for reference?

And a
select length(headervalue) from dbmail_headervalue order by length(headervalue) 
desc;
shows there are a lot of longer entries:

 length
--------
  14890
   9660
   7985
   7985
   7768
   7768
   7625
   7596
   7596
   6791
   6595
[snip]

mfg zmi
-- 
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0676/846 914 666                      .network.your.ideas.
// PGP Key:         "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: EA39 8918 EDFF 0A68 ACFB  11B7 BA2D 060F 1C6F E6B0
// Keyserver: www.keyserver.net                   Key-ID: 1C6FE6B0

Attachment: signature.asc
Description: This is a digitally signed message part.

_______________________________________________
DBmail mailing list
[email protected]
https://mailman.fastxs.nl/mailman/listinfo/dbmail

Reply via email to