Ilja Booij wrote:
Aaron Stone wrote:
Ilja Booij <[EMAIL PROTECTED]> said:
That would imply parsing every message on delivery..would that make
sense? A lot of messages don't have to be parsed, because they're
only retreived using POP. Sounds like a waste of resources to me,
unless we have other reasons to parse the message.
Isn't that the plan for the header and mime cache table? It would
certainly make sense to be able to disable parse-on-delivery if you only
use POP, but most people probably use all IMAP or a mixture of the two.
Doesn't it make sense to parse the message only the first time it's
fetched? Messages read using
IMAP will only have to be parsed once. Messages read using POP will not
have to be parsed.
No that doesn't make any sense at all. Any design that can't handle multiple
messages in single queries (search, retrieve, sort, etc) is flawed. Parsing is
relatively cheap and fast (assuming we use a decent parser). Talking to the
backend is relatively expensive in terms of IO latency and backend resources.
Also, we want to keep dbmail small and simple, not add new levels of complexity
that don't offer much added value.
The email storage should be consistent.
The email storage should be highly optimized.
Neither of these qualifications apply to this to-parse-or-not-to-parse thread.
If we go for parsed storage, all of the storage should be converted.
Most messages that are retrieved are retrieved relatively few times. That is: I
want to read my messages once, but I want to read my messages fast, not wait for
the backend to store it's parsed equivalent. Esp if all I do is delete the mail,
or read it maybe once or twice sometimes later.
It would make the fetching a bit more cumbersome..
Fetching is already too cumbersome as it is, no?
But it could work
like this:
Message already parsed:
1. Fetch parsed message
2. parsed message returned to client
Message not yet parsed
1. Fetched parsed message
2. no message returned
3. fetch raw message
4. parse message
5. store parsed message
6. return parsed message
So instead of improving dbmail performance by using cached information you'd be
slowing dbmail down significantly without any advantages. Just think of a
scenario where someone does a fetch 1:*. In stead of a single query you'd get
two queries for each message in the folder, not to mention the cascade in
network traffic this would trigger.
This idea only makes sense as part of a migration tool when we move to
mime-caching in the database. As part of the retrieval chain it sucks, if you
pardon my french.
Finally, performance of the delivery chain is much less visible to users, and
therefor less critical. If users have to wait a second longer for a new message
to be inserted, they won't complain. But make them wait a second longer for each
message they want to retrieve, and they will start calling the support desk.
Meaning *you*.
So please, please, if we move to parsed storage, just *go* for it. No
compromises.
--
________________________________________________________________
Paul Stevens [EMAIL PROTECTED]
NET FACILITIES GROUP GPG/PGP: 1024D/11F8CD31
The Netherlands_______________________________________www.nfg.nl