Re: [Dbmail-dev] header storage schema changes. (W)Here we (may) go.

Matthew T. O'Connor Sat, 1 Jan 2005 20:30:43 +0100 (CET)

Paul J Stevens wrote:

Matthew T. O'Connor wrote:
Personally, I think we should use a very clean and generic design forheaders / header searching.
First: Leave the current table structures intact, I don't think werequire any modifications to them.
Second:  Add 2 new tables: header_list and header_values
Breaking out the header names to a table of their own may be useful.But doing it just for storage's sake seems a bit overkill given theadded complexity in constructing queries and maintaining dataintegrity. If it's boosts performance for the target use-cases(search,sort,thread) I all for it, if.

Agreed, the only reason I propose the header_list table is forperformance reasons. This allows the header_values table (which will bemuch bigger) to be searched based on an int comparison rather than textsearch. I think this is a serious performance boost, but if it's proventhat it's not, then we don't need it.

header_list: ( Contains an exhaustive list of all headers from allmessages in the database. )
   header_id   int     primary key
   header        text   not null
header_values: ( Contains the values from all the headers in all themessages in database )
   header_value_id   serial primary key
message_id int (references unique message ID from thephysmessage table)header_id int (references unique ID from the header_listtable)header_value text (the actual value from this header in thismessage)hearder_order int ( optional column, used to be able to recreatethe header order from the original message)
Header_order will never happen. Recreating headers from the headertables will never happen. Complete headers are stored in themessageblk. Also, header-order from the original message is already*not* being maintained. Gmime does it's own reformatting andreshuffling of the headers.

Ok, I wasn't sure, but that helps make things more simple if we don'tneed it.

This structure will make it very easy to query all the headers from agiven message or find all the messages with a given header, or agiven header value. It also leaves our current structure intactwhich will make it easier to phase in.
Agreed. Starting with a single separate headers table, or with twotables like you propose will probably be the starting point. Once wehave consistent storage of headers, it will be relatively easy to movecertain headers to tables of their own, or merge them into thephysmessage table. Of course, postgres users could probably even usetriggers for stuff like that.

Glad you agree this is a good starting point. I hope that we don't everneed to special case any headers, but only time will tell. Perhaps thespecial casing is that we hard code certain headers in the header_listtable so that we always know that sendername has a header_id of 1therefore the code can automatically search the headers table withouthaving to do that lookup.

Either way, if we can get away without special casing any headers itshould make the code related to header searching very easy.

What do you think? I don't think we need to special case any headersnot even sendername or subject.
Well, yukatan's datamodel looks like a very serious attempt atoptimizing datastorage for email. My working assumption is that thereare some very valid reasons for doing it the way they're doing things.Also, as a long term goal, a unified model for sql based email storageis something I think about.

I just took a look at the yuktan datamodel, it looks nice, and it make alot more sense then ours, but they are only supporting PostgreSQL sothey don't have to deal with all the Mysql limitations. They useforeign keys to manage deletes for them and maintain integreity. Theycreate custom domains for most of their datatypes. They also keep theentire message in one field in the messages table which we don't dobecause of the size limitations of MySQL (at least older mysql...)The other major thing they have on us is that they manage each MIMEentity inside an email separately. This has nice advantages forsearching headers of attachments etc.. That would be a large changefrom our datamodel, but might be nice to think about someday.

Their model makes a lot of sense in many ways, but I still don't likesome of it. They special case a handful of headers for each MIME entitynot only by having a separate copy in the entity table but also byhaving a separate table for many of these headers. Perhaps someday wecan add this for performance reasons, but I don't think we need to.Yukatan also has a headers table much like the one I described above,they don't have the header_list table broken out the way I do, but Ithink that is why then need to special case alot of the headers.

So in summary, yes the Yukatan model is nice, and has a lot ofadvantages over ours, but I still think we are best served by startingwith the two table design I described earlier. This should besufficiently fast and flexible that we can go a long way before we haveto special case anything.


Matthew

Re: [Dbmail-dev] header storage schema changes. (W)Here we (may) go.

Reply via email to