[Dbmail-dev] On rewrotes, steps, etc.

Mikhail Ramendik Sun, 24 Oct 2004 23:42:02 +0200 (CEST)

Paul J Stevens wrote:

> I use it too. And don't plan on abandoning the clients who pay me to
> provide email services for their students 
> and employees. Typically many small mailboxes. That's where dbmail is
> quite good atm.


Agreed. For many small mailboxes the current code may be great -
especially if *most* people download their mail (with POP3, or with IMAP
that is used as another POP3), and therefore searching is not used.

The problems are with searching performance, and with downloading on BIG
boxes.

> >>If you're going to do a rewrite, beware the second systems effect..
> 
> Had to look that one up :)
> 
> It's also probably why we wont go with twisted. It's a huge bloat.

Agreed. My letter was not a proposal but a research pointer, and my
research results are the same as yours.

I hoped, basically, that we could rip the IMAP/POP3 code out of there.
But no, they seem to use some strange object model which is probably too
much for us.

> For me, dbmail is about storage of email first 
> and last.

If you include searching in "storage", then so it is for me. The "do one
thing but do it well" principle. 

In fact, this principle is why I came to the idea of storing mail in SQL
(and nearly started to write it myself, but decided to google first, and
found dbmail). There already are programs that manage and search data
efficiently - DBMS systems; email is data; so an email storage engine
should use a DBMS system. One program, one thing.

<OFFTOPIC> Well, yes, so I *am* an Eric Raymond fan, and aspiring
imitator. I even gave a try to Python because Eric wrote so - but then I
got to like the language itself. I'm only writing it to explain the
"raymondisms" that show up in my letters</OFFTOPIC>

>  I'm not smart enough to conceive and implement the ultimate
> mailstorage engine. But I can work on 
> providing current functionality in software modules aimed at extension
> and customization. And export-tool for 
> dumping a dbmail-database (or selected subset thereof) would for
> instance be a nice warming up excersize in 
> accessing the current storage layout.

To make it more useful, I'd suggest putting output into an mbox using
the python standard modules. It won't be any more complicated.

> After that, replacing dbmail-smtp, dbmail-util and dbmail-user with
> python-based rewrites could lead to a nice 
> set of base-classes to tackle the more complex task of replacing the
> daemons should we choose to do so (or 
> find someone crazy enough). However, eliminating the code required by
> the stand-alone tools will have the 
> added benefit of a good spring cleaning in the source tree.

If we go this way, we are stuck with the existing storage until the
daemons get rewritten.

But this means we are stuck with slow searching, and somewhat slow
fetching. 

Within the current storage, searching can be improved by using fulltext
non-indexed regexp searching (although I'm not sure if it works with
pgsql). And fetching can be improved by using is_header, and moving
everything *except* actual messageblk retrieval into one query per fetch
(not per message).

But this is quite limited. And adding separate header tables could break
the current code.

> Let's not take the path of 'enlightenment' then, heh. Full backward
> compatibility where possible, small steps.

Agreed. But we need to think what those small steps should be. (And
probably have them laid out in a document - I'm good at that, see the
"Trash can standard" on FreeDesktop.org for an example). Let's
brainstorm that?

Actually, the first step is certain - "optimize whatever is easy to
optimize in the current code". This should probably lead to a 2.1 beta
series ASAP, and later a 2.2 stable series, to have a production
milestone before the rewrite. 

(BTW, how did you like my dbmysql patch? In my eyes it's one of those
easy optimizations, but I ceratinly won't be sure of it until I see the
result of an independent review :)

Now the later steps are the real area for discussion.

Your first idea is, to replace by executable file - dbmail-util,
dbmail-smtp, dbmail-user first. The problem is the need to stick to
current storage. We can't even expand it by additional tables that are
written but not used, until we replace dbmail-lmtp as well.

I see two possible objections to this:

(1) In principle, every step should lead to some useful improvement when
at all possible. This will give it a userbase (and thus tester-base). In
fact, even I can't become an active user before I get a quicker fetch. I
can document and code, but not use, and therefore not *really* test.

(2) The limitations of the current storage might creep into our base
class (or base whatever) interface. We can work to avoid this, of
course.

So, here's another idea - replacing by functionality. 

We write the "put into database" part first. And we make all current
code use that instead (more on that later, see [*]). Now we can expand
the storage as soon as it's backwards compatible, so the
reading/searching parts can read the data. 

Then we write the "fetch/search from database" part, and have the code
interface with that. At this point, although the daemons are not yet
rewritten, the thing starts to work faster. 

Then we take a break, stabilize the interfaces, test performance in
various cases - very important. The result should, ideally, be another
stable version. Old daemons, new storage. 

After that we have a stable storage engine, with a well defined
interface, and some form of interoperation with C code as a bonus. Then
we can think what to do about the daemons. Perhaps they're good enough
to keep! Or perhaps we should do a Python rewrite. Or, bind into Dovecot
or Cyrus or whatever instead.

[*] On the matter of interoperation. There is a standard C/Python
binding interface. But I'm not sure it will always be the best solution,
at least for mail receiving - it might lead to process spawning like
dbmail-smtp. This makes things slower, while we want them to be faster
with every step. 

So we need to ensure that no new process is spawned, at least when the
same process uses the store part several times, and ideally also with
different calling processes - this will help us keep the db connection,
and dbmail-smtp will finally be fast.

One of the possible ideas would be implementing an extended
dbmail-lmtpd, and make both dbmail-smtp and dbmail-imap use that for
storing messages.

It's a brainstorm, so if some of these ideas get dumped, no problems :)
As soon as the main idea - to gradually transform dbmail into a DB mail
storage engine that is  fast, well-written, well-documented, and
well-accessible for other  programs - stands, any method may be good.

Yours, Mikhail Ramendik

[Dbmail-dev] On rewrotes, steps, etc.

Reply via email to