On Donnerstag 26 Februar 2009 Paul J Stevens wrote: > Michael Monnerie wrote: > > On Dienstag 24 Februar 2009 Michael Monnerie wrote: > >> As we can drop dbmail_headervalue_3 index anyway, drop that 255 > >> char field also, and store only the full headervalue. Use that > >> nice compressing technique Niki implemented already, but without > >> hash. That might be more overhead than searching the whole table. > >> If it needs be used, use a hash as short as possible to save > >> storage. A cheap md5 hash should be enough, maybe less is > >> possible. > > I propose we drop the index, but keep the hash as a varchar.
If it's for the full length line, it could be good for searching double values on INSERT time. But only for that, or is there any other use? The question is: Is it worth the effort? If the hash is reasonably short, I guess yes. It should be limited to 16 bytes (to save disk space) and allow duplicates, because it doesn't matter to have a hash crash here when you compare the full text afterwards. You can easily SELECT ... WHERE hashfield='computed_hash' AND headervalue='new_line' and the db can use the index over hashfield to find only the 1-2 hashes that fit and finally compare contents using the full line. BTW: Do you allow hash crashes in the single instance store of the messageparts? I guess yes. mfg zmi -- // Michael Monnerie, Ing.BSc ----- http://it-management.at // Tel: 0660 / 415 65 31 .network.your.ideas. // PGP Key: "curl -s http://zmi.at/zmi.asc | gpg --import" // Fingerprint: AC19 F9D5 36ED CD8A EF38 500E CE14 91F7 1C12 09B4 // Keyserver: wwwkeys.eu.pgp.net Key-ID: 1C1209B4 _______________________________________________ Dbmail-dev mailing list [email protected] http://twister.fastxs.net/mailman/listinfo/dbmail-dev
