Hello, now that I'm subscribed to the -dev list also I want to greet everybody, and resend my message to Niki and Paul:
On Dienstag 03 Februar 2009 Niki Guldbrand wrote: > /me waves his hand Hi Niki! Sorry I'm not on -dev, so didn't know you. > git clone git://git.dbmail.eu/paul/dbmail I did that now, and have some questions: dbmail_23=# \d dbmail_headervalue Tabelle »public.dbmail_headervalue« Spalte | Typ | Attribute -------------+----------------+----------------------------------------------------------------- id | bigint | not null default nextval('dbmail_headervalue_id_seq'::regclass) hash | character(256) | not null headervalue | text | not null default ''::text Indexe: »dbmail_headervalue_pkey« PRIMARY KEY, btree (id) »dbmail_headervalue_1« btree ("substring"(headervalue, 0, 255)) »dbmail_headervalue_2« btree (hash) Is "hash" really a hash of the value? I guess that's for performance only, so I immediately investigated and think I have another proposal, that should be faster and save disk space. But, on the downside, it's not completely normalized anymore. Still, I think it could be worth it, depending on how often it is used in the code: First, I tested my "distribution" of headervalue lengths: OVERALL: select count(1) from dbmail_headervalue; 2832434 select count(1) from dbmail_headervalue where length(headervalue)>200 AND length(headervalue)<256; (skipping other SELECTs later) LEN 200..255: 11831 LEN 100..200: 74519 LEN 51..100: 438533 LEN 20..50: 1929089 LEN 0..20: 319328 LEN >255: 20202 So most headervalues are <100 characters long I would say, and only 0,71% (!) of all headervalues are >255 chars. So the table could be changed to: id | bigint | not null default nextval('dbmail_headervalue_id_seq'::regclass) headervalue | character varying (255) | default null headervalue_full | text | default null Indexe: »dbmail_headervalue_pkey« PRIMARY KEY, btree (id) »dbmail_headervalue_1« btree (headervalue) And then fill the real headervalue into "headervalue" if it's <=255 chars, and only to headervalue_full if it's >255 chars long. What changes? *) no more hash calculation (CPU save) for each headervalue_entry on insert *) save 256 hash bytes per headervalue entry (for me with 2.8 mio. entries thats about 700MB on disk). *) saves one complete index (over "hash"), roughly 700MB again, maybe more *) current SELECT change: a) no more ...where substring(headervalue,0,255) ... needed b) SELECT ... WHERE headervalue like '%search%' becomes SELECT ... WHERE headervalue like '%search%' or headervalue_full like '%search%' *) current UPDATE change: if (strlen(hval)) > 255 { UPDATE ... SET headervalue_long=hval,headervalue=NULL } else { UPDATE ... SET headervalue=hval,headervalue_long=NULL } Like that, always only one of headervalue or headervalue_long is set, saving disk space and using the index whenever possible, and for those 0,7% of all entries a full scan is needed (but that's needed now anyway if you search for %search%, as you can't use an index then). Sorry that I have this idea only now, but it could really save lots of disk space and CPU time, and improve speed even further. I don't know the amount of work that would mean for you, but I can imagine it's relatively straightforward. mfg zmi -- // Michael Monnerie, Ing.BSc ----- http://it-management.at // Tel: 0660 / 415 65 31 .network.your.ideas. // PGP Key: "curl -s http://zmi.at/zmi.asc | gpg --import" // Fingerprint: AC19 F9D5 36ED CD8A EF38 500E CE14 91F7 1C12 09B4 // Keyserver: wwwkeys.eu.pgp.net Key-ID: 1C1209B4
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ Dbmail-dev mailing list Dbmail-dev@dbmail.org http://twister.fastxs.net/mailman/listinfo/dbmail-dev