> > (1) Only select the header messageblk when only the header or a part

> > of it is to be fetched on IMAP.
> 
> That's what the is_header field is going to be used for.
> 

Q: why is the headers split from the message? If it's an attempt to
minimize disk io when only fetching imap headers, then that will only do
that ever so slightly, since it will still have to do lots of seeks, and
for postgresql it will choose not to use an index if the planner
realizes that a sequence scan might be faster. Again, for PostgreSQL you
may be able to improve on this by clustering the table on an index over
is_header...

Q: Why not add to the messageblks table fields containing parsed
headers:

CREATE TABLE dbmail_messageblks (
        messageblk_idnr bigint not null default
nextval('dbmail_messageblk_idnr_seq'::text) PRIMARY KEY,
        physmessage_id bigint REFERENCES dbmail_physmessage(id) ON
DELETE CASCADE,
        from varchar,
        to varchar,
        cc varchar,
        subject varchar,
        received timestamptz,
        messageblk text not null,
        blocksize bigint not null default (0)::bigint
);

Or doesn't imap / pop3 work that way,.... Do they need to receive the
actual headers, or is a list enough? In any case, it would simplify
creating custom (webmail) interfaces...

> > (2) Create an index for the 'messageblk' field in
dbmail_messageblks. 
> > At least MySQL allows this (can anyone tell me if Postgres does?). 
> > Then, on IMAP header field searches, do not load/parse/check;
instead, 
> > create a regular expression  and do a SELECT with it, selecting only

> > header blocks. (MySQL specific again, Postgres comments welcome).
> 
> well, the messageblks can become largish, and mysql also has 
> some limit here. Too bad innodb doesn't support full-text searches.
> 
> I wonder what kind of index we get when we add one to the 
> messageblk field... I should try that some time. But not on 
> my main development machine :-)

Any index on a (large) text field will result in severe filesystem
bloat, mostly because they'll be more or less useless. You should
instead look at using something like tsearch2, or cook up your own
method so it's compatible with mysql.

That could be something like creating a table filled with messageid's
and single words from each message, with an index on messageid,word, and
doing a select * from messageblks m inner join word_table w on
m.messageblk_idnr = w.messageblk_idnr where word IN
(quoted_comma_separated_list_of_words_to_search_for);

Or even better, two tables, one with messageid,wordid, and one with
wordid,word to avoid duplicate words.

Kind regards,

john

Reply via email to