Re: [Dbmail-dev] OT: Postgresql problem

Sean Chittenden Thu, 23 Dec 2004 23:04:25 +0100 (CET)

Here is the analysis of the query:
---------
explain analyze SELECT distinct ip FROM rawdata WHERE sid='2';


<puts_on_database_consulting_hat/>


Ah, the hat is XHTML 1.0 Strict, I see ;-)


Err...  XML strict, but close enough.  :)

This never came up in our discussions of whether to use one or twotablesfor fastheaders (one table in key/value pairs, or two tables in id/key+keyid/value relationship)... but it seems to me that we'd have aproblem
if 10% of the keys were From, another 10% To, then Subject, Date,
Received... and then the last 10% a bunch of random weird headers, like
from spam checkers or whatever else. (Obviously these are examplefigures,
there are many more common headers). Trying to search for all keys in a
given mailbox with such and such criteria might also trigger asequential
scan of death, no?

Absolutely!!! But, there are a few tricks worth implementing and a fewthings to keep in mind.

1) PostgreSQL's planner isn't stupid and knows how to work with largedata sets very well2) PostgreSQL supports partial indexes (ie: CREATE INDEX tbl_col_idx ONtbl (col) WHERE bar = 123; CREATE INDEX...) and its planner makesactive use of this information.3) The number of email headers likely would fit on a single disk pageand is an ideal candidate for being stuffed into its own table andhaving an ID reference the header table. Then Pg would do an indexscan on the header table, grab the id, and use the partial indexsupport to selectively grab certain headers insanely fast.4) There's a table partitioning trick using inheritance that can beused too, but I'll save that rabbit for later.

The rest is a bit insinuatory but underscores a personal frustrationwith many open source applications and it drives me bonkers (ok, I'mbeing polite... way worse than bonkers) that people willingly fallinto the MySQL trap.

Basically, there is no excuse for any part of dbmail to be slow,regardless of how small or big the data set in dbmail is... the onlyproblem is that dbmail chooses to retain compatibility w/ some otherdatabase (*cough*MySQL*cough*) and as a result essentially lobotomizesany efficient or performance friendly data management scheme. :( Thecase above is a good example of how something can be made fast, buthasn't because they're coming from a strict SQL world that doesn'tthink in terms of the database doing work for the developer (ie, storedprocedures/triggers, etc.). Database are fast if you let them do whatthey're best at... and can be a dog to work with if a developer'sability to use them effectively is stripped away in the name of crossdatabase compatibility.

I'd like to state also that database compatibility can be achieved, butit has to be done at a higher level than it is currently. Right nowdbmail is compensating for a certain RDBMS's lack of functionality andperformance greatly suffers as a result. If the abstraction was pushedhigher such that data manipulation work that dbmail does now would onlybe done in the MySQL case, then dbmail could be made more efficient.

For example, sending data to a client (dbmail) so a client can riflethrough the data in order to make a decision, then go back to thedatabase to get the final answer to send to the mail client.... *puke*Fundamentally non-optimal, but a necessary approach in a MySQL world.Abstracting to the point that logic would allow the use of preparedstatements, stored procedures, triggers, or even data caching would bea good thing and would prevent *many* round trips from dbmail to thedatabase. In an ideal world for PostgreSQL users, dbmail would simplytranslate IMAP or POP3 into SQL function calls and the database wouldhandle everything else (ie, SELECT get_msg(...), SELECTinsert_msg(${username}, ${message}), etc.).

Remember DMS in RDBMS stands for data management system. Let thedatabase manage the data. Right now dbmail is managing data becauseMySQL can't. -sc


--
Sean Chittenden

Re: [Dbmail-dev] OT: Postgresql problem

Reply via email to