Matthew T. O'Connor wrote:
Personally, I think we should use a very clean and generic design for headers / header searching.

First: Leave the current table structures intact, I don't think we require any modifications to them.

Second:  Add 2 new tables: header_list and header_values

Breaking out the header names to a table of their own may be useful. But doing it just for storage's sake seems a bit overkill given the added complexity in constructing queries and maintaining data integrity. If it's boosts performance for the target use-cases (search,sort,thread) I all for it, if.



header_list: ( Contains an exhaustive list of all headers from all messages in the database. )
   header_id   int     primary key
   header        text   not null

header_values: ( Contains the values from all the headers in all the messages in database )
   header_value_id   serial primary key
message_id int (references unique message ID from the physmessage table)
   header_id      int    (references unique ID from the header_list table)
   header_value   text  (the actual value from this header in this message)
hearder_order int ( optional column, used to be able to recreate the header order from the original message)


Header_order will never happen. Recreating headers from the header tables will never happen. Complete headers are stored in the messageblk. Also, header-order from the original message is already *not* being maintained. Gmime does it's own reformatting and reshuffling of the headers.


This structure will make it very easy to query all the headers from a given message or find all the messages with a given header, or a given header value. It also leaves our current structure intact which will make it easier to phase in.

Agreed. Starting with a single separate headers table, or with two tables like you propose will probably be the starting point. Once we have consistent storage of headers, it will be relatively easy to move certain headers to tables of their own, or merge them into the physmessage table. Of course, postgres users could probably even use triggers for stuff like that.


What do you think? I don't think we need to special case any headers not even sendername or subject.

Well, yukatan's datamodel looks like a very serious attempt at optimizing datastorage for email. My working assumption is that there are some very valid reasons for doing it the way they're doing things. Also, as a long term goal, a unified model for sql based email storage is something I think about.



--
  ________________________________________________________________
  Paul Stevens                                  mailto:[EMAIL PROTECTED]
  NET FACILITIES GROUP                     PGP: finger [EMAIL PROTECTED]
  The Netherlands________________________________http://www.nfg.nl

Reply via email to