Re: [Dbmail-dev] Cached Headers (was NNTP)

Ed K. Thu, 18 Mar 2004 15:56:10 +0100 (CET)

This is for digestion:

I like the seperate tables, and to show my point, here are some sql examples:


mysql> select version();
+---------------------+
| version()           |
+---------------------+
| 4.0.12-standard-log |
+---------------------+
1 row in set (0.00 sec)

mysql> select * from title;
+----+----------+
| id | title    |
+----+----------+
|  1 | From:    |
|  2 | To:      |
|  3 |          |
|  4 | Subject: |
+----+----------+
4 rows in set (0.00 sec)

mysql> select * from header;
+----+----------+--------------------------------------------+
| id | title_id | header                                     |
+----+----------+--------------------------------------------+
|  1 |        1 | [EMAIL PROTECTED]                                  |
|  2 |        2 | [EMAIL PROTECTED]                                  |
|  4 |        3 | unknown header1: this is an unknown header |
|  5 |        4 | test message                               |
|  6 |        3 | unknown header3: this is an unknown header |
+----+----------+--------------------------------------------+
5 rows in set (0.00 sec)

mysql> select IF (title.id=3,header.header,concat(title.title ," ", 
header.header)) as header from header join title on title.id=header.title_id 
order by
header.id;
+--------------------------------------------+
| header                                     |
+--------------------------------------------+
| From: [EMAIL PROTECTED]                            |
| To: [EMAIL PROTECTED]                              |
| unknown header1: this is an unknown header |
| Subject: test message                      |
| unknown header3: this is an unknown header |
+--------------------------------------------+
5 rows in set (0.00 sec)


order is garenteed, format is garenteed except for: spacing between header and 
subject, case of header, and any trailing spaces

who says joins are not fast? faster then searching strings. faster if the 
indexes are setup right.

ed

On Thu, 18 Mar 2004, Ilja Booij wrote:

> Finally I come in the discussion :)
>
> I'm all for optimizing headers, BUT:
> I'd like to optimize for IMAP. Optimizing for POP3 is no issue, since
> POP just downloads the whole message.
>
> I don't really like to optimize for native clients. DBMail's IMAP should
> be fast enough to work with normal IMAP clients (webbased or running on
> clients). One way of optimizing is to implement IMAP features extensions
> that'll help getting faster, like SORT and THREAD.
>
> Currently, I'm not busy with this at all, but if anybody wants to do
> some statistical research to find out what headers different clients
> FETCH using IMAP, we can see which headers can cache.
>
> By the way, I guess that having one table with cached headers like Aaron
> proposed would be the best. I think we should only cache headers that
> are used frequently and/or that are always requested *together* by
> clients (it's no use if for a certain client program, we get all but one
> headers from the cache, and get the last header from parsing the message.
>
> Ilja
>
> Magnus Sundberg wrote:
> > Well,
> > I beleive a flag on the messageblock would be a good start, since it is
> > not that intrusive.
> > But I am very unsusre about how much performance we would gain by
> > searching directly in heavily optimized fast-header tables. Is the pain
> > worth the gain?
> > Small installations do not need the extra performance.
> > Large installations can probably get away with more expensive hardware
> > Huge installations may run out of steam on the best availible hardware,
> > here the fast-headers might be of some use.
> >
> > My beleif is that we should go for the easiest implementation, that is
> > the least intrusive and get a feeling for what we need.
> > I know that MySQL supports queries with regular expressions. These
> > queries will probably be quite fast.
> > I think we should se what happens when we implement server side sorting,
> > based on clever SQL statements using the messageblocks.
> >
> > The huge question is: How much do we gain in performance by using
> > special header tables compared to searching the messageblocks? The gain
> > might be quite small.
> >
> > well, that my $0.02
> >
> > Magnus
> >
> > Aaron Stone wrote:
> >
> >> I'm thinking in terms of searches primarily, where a query might be:
> >>
> >> select distinct(message_id) from fastheaders
> >>     where header = 'to' and contents like '%bob%'
> >>
> >> You're looking for displaying the message, where the webmail application
> >> probably doesn't want to read in the header blocks of the message and
> >> parse
> >> them just to show the Date, From, Subject, To, CC, Bcc, Fcc, Reply-To
> >> fields.
> >>
> >> Originally I suggested using a table that has the header fields as the
> >> columns, thinking that those were the only commonly requested ones,
> >> but it was
> >> either Ilja or Roel at the time who watched for a couple of IMAP
> >> queries and
> >> saw all kinds of headers being requested both for searching and for
> >> listing
> >> messages. So the configurable columns thing means that the admin needs
> >> to be
> >> watching to see what fields his clients are requesting, and to add
> >> those, and
> >> it means quite a bit more complexity in DBMail, which would need to
> >> either
> >> find the header in the header cache table or parse the full headers.
> >>
> >> For your webmail application, since you know exactly which header
> >> fields you
> >> want and you're writing your own queries, you can just do this:
> >>
> >> select header, contents from fastheaders
> >>     where message_id = 82 and (header = 'To' or header = 'From' or ...)
> >>
> >> My instinct is that this would still be reasonably fast, completely
> >> flexible
> >> and generic. Even if the query took slightly longer, the much smaller
> >> result
> >> set size and the negligible parsing needed would offset the query speed.
> >>
> >> Aaron
> >>
> >>
> >> ""Mark Mackay - Orcon"" <[EMAIL PROTECTED]> said:
> >>
> >>
> >>> Surely for webmail applications, etc you're after a join where you
> >>> get the
> >>> headers as columns for displaying a message list; to avoid having nested
> >>> queries.
> >>>
> >>> E.g. avoid "select * from messages"; then for each message "select *
> >>> from
> >>> headers where messageid=X"
> >>>
> >>> I know the goal is to have the headers table really flexible so that
> >>> additional headers can be added, etc -- but maybe this could be done
> >>> differently. A run-time config file where you mapped the column
> >>> names/numbers to header names, etc.
> >>> That way if a developer wanted to add a new column and start caching
> >>> a new
> >>> header they could alter the table and simply update the config files,
> >>> one-by-one. Or disable the headercache altogether.
> >>> I could be missing something here -- is there a query which can turn a
> >>> one-to-many join into a single line?
> >>>
> >>> /Mark
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: [EMAIL PROTECTED]
> >>>> [mailto:[EMAIL PROTECTED] On Behalf Of Aaron Stone
> >>>> Sent: Thursday, 18 March 2004 6:30 p.m.
> >>>> To: DBMAIL Developers Mailinglist
> >>>> Subject: Re: [Dbmail-dev] NNTP
> >>>>
> >>>> Seems terribly complicated, and it would make the messages very
> >>>> difficult to
> >>>> dump by hand. I'm also not sure why you want to have two tables; is
> >>>> that what
> >>>> you're proposing:
> >>>>
> >>>> header_labels (index on label)
> >>>> id     label
> >>>> --------------
> >>>> 1      to
> >>>> 2      from
> >>>> 3      subject
> >>>>
> >>>> message_headers (index on messageid, labelid, header and labelid,
> >>>> header)
> >>>> -----------------------------------------------------
> >>>> id     message_id    label_id      header
> >>>> 1      1             1             [EMAIL PROTECTED]
> >>>> 2      1             2             [EMAIL PROTECTED]
> >>>> 3      1             3             Hey Bob, it's Joe!
> >>>>
> >>>>
> >>>> I think that unless your database has miraculously good JOINs, this
> >>>> is a
> >>>> nightmare; note that you cannot reassemble by returning rows in
> >>>> ascending
> >>>> order of the 'id' column, another column would be needed to keep the
> >>>> order of
> >>>> the headers. This is probably the most compact was to store the
> >>>> headers,
> >>>> though, but I think it is at the cost of being severely slow and
> >>>> obtuse.
> >>>>
> >>>> My proposed header looks like this:
> >>>>
> >>>> fastheaders  (index on messageid, header, contents and header,
> >>>> contents)
> >>>> id    message_id     header        contents
> >>>> 1     1              to            [EMAIL PROTECTED]
> >>>> 2     1              from          [EMAIL PROTECTED]
> >>>>
> >>>>
> >>>> Suffers from using a lot more space, but would have faster search
> >>>> times.
> >>>> Administrators would be able to more easily query the table by hand.
> >>>> As it is
> >>>> intended to only be a cache, one could freely zap this table and
> >>>> rebuild it
> >>>> from the original headers. Naturally, I'm strongly advocating my
> >>>> idea ;-)
> >>>>
> >>>> Aaron
> >>>>
> >>>>
> >>>>
> >>>> ""Ed K."" <[EMAIL PROTECTED]> said:
> >>>> [snip]
> >>>>
> >>>>> What if a solution was proposed in which the order and the
> >>>>
> >>>>
> >>>> formatting of
> >>>>
> >>>>> the headers could be preserved. A documented method to
> >>>>
> >>>>
> >>>> deconstruct and
> >>>>
> >>>>> construct the headers, and working code to be included in
> >>>>
> >>>>
> >>>> db.c. Then a flag
> >>>>
> >>>>> in the messageblks row that would indicate if the header is
> >>>>
> >>>>
> >>>> either only,
> >>>>
> >>>>> also, or not in the fastheaders table. i suggest we call
> >>>>
> >>>>
> >>>> the table pair
> >>>>
> >>>>> message_headers and header_labels.
> >>>>>
> >>>>> ed
> >>>>>
> >>>>> Security on the internet is impossible without strong, open,
> >>>>> and unhindered encryption.
> >>>>>
> >>>>> _______________________________________________
> >>>>> Dbmail-dev mailing list
> >>>>> Dbmail-dev@dbmail.org
> >>>>> http://twister.fastxs.net/mailman/listinfo/dbmail-dev
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Dbmail-dev mailing list
> >>>> Dbmail-dev@dbmail.org
> >>>> http://twister.fastxs.net/mailman/listinfo/dbmail-dev
> >>>>
> >>>
> >>> _______________________________________________
> >>> Dbmail-dev mailing list
> >>> Dbmail-dev@dbmail.org
> >>> http://twister.fastxs.net/mailman/listinfo/dbmail-dev
> >>>
> >>
> >>
> >>
> >>
> >
> >
> >
> > _______________________________________________
> > Dbmail-dev mailing list
> > Dbmail-dev@dbmail.org
> > http://twister.fastxs.net/mailman/listinfo/dbmail-dev
> _______________________________________________
> Dbmail-dev mailing list
> Dbmail-dev@dbmail.org
> http://twister.fastxs.net/mailman/listinfo/dbmail-dev
>

Re: [Dbmail-dev] Cached Headers (was NNTP)

Reply via email to