Paul J Stevens wrote: > I use it too. And don't plan on abandoning the clients who pay me to > provide email services for their students > and employees. Typically many small mailboxes. That's where dbmail is > quite good atm.
Agreed. For many small mailboxes the current code may be great - especially if *most* people download their mail (with POP3, or with IMAP that is used as another POP3), and therefore searching is not used. The problems are with searching performance, and with downloading on BIG boxes. > >>If you're going to do a rewrite, beware the second systems effect.. > > Had to look that one up :) > > It's also probably why we wont go with twisted. It's a huge bloat. Agreed. My letter was not a proposal but a research pointer, and my research results are the same as yours. I hoped, basically, that we could rip the IMAP/POP3 code out of there. But no, they seem to use some strange object model which is probably too much for us. > For me, dbmail is about storage of email first > and last. If you include searching in "storage", then so it is for me. The "do one thing but do it well" principle. In fact, this principle is why I came to the idea of storing mail in SQL (and nearly started to write it myself, but decided to google first, and found dbmail). There already are programs that manage and search data efficiently - DBMS systems; email is data; so an email storage engine should use a DBMS system. One program, one thing. <OFFTOPIC> Well, yes, so I *am* an Eric Raymond fan, and aspiring imitator. I even gave a try to Python because Eric wrote so - but then I got to like the language itself. I'm only writing it to explain the "raymondisms" that show up in my letters</OFFTOPIC> > I'm not smart enough to conceive and implement the ultimate > mailstorage engine. But I can work on > providing current functionality in software modules aimed at extension > and customization. And export-tool for > dumping a dbmail-database (or selected subset thereof) would for > instance be a nice warming up excersize in > accessing the current storage layout. To make it more useful, I'd suggest putting output into an mbox using the python standard modules. It won't be any more complicated. > After that, replacing dbmail-smtp, dbmail-util and dbmail-user with > python-based rewrites could lead to a nice > set of base-classes to tackle the more complex task of replacing the > daemons should we choose to do so (or > find someone crazy enough). However, eliminating the code required by > the stand-alone tools will have the > added benefit of a good spring cleaning in the source tree. If we go this way, we are stuck with the existing storage until the daemons get rewritten. But this means we are stuck with slow searching, and somewhat slow fetching. Within the current storage, searching can be improved by using fulltext non-indexed regexp searching (although I'm not sure if it works with pgsql). And fetching can be improved by using is_header, and moving everything *except* actual messageblk retrieval into one query per fetch (not per message). But this is quite limited. And adding separate header tables could break the current code. > Let's not take the path of 'enlightenment' then, heh. Full backward > compatibility where possible, small steps. Agreed. But we need to think what those small steps should be. (And probably have them laid out in a document - I'm good at that, see the "Trash can standard" on FreeDesktop.org for an example). Let's brainstorm that? Actually, the first step is certain - "optimize whatever is easy to optimize in the current code". This should probably lead to a 2.1 beta series ASAP, and later a 2.2 stable series, to have a production milestone before the rewrite. (BTW, how did you like my dbmysql patch? In my eyes it's one of those easy optimizations, but I ceratinly won't be sure of it until I see the result of an independent review :) Now the later steps are the real area for discussion. Your first idea is, to replace by executable file - dbmail-util, dbmail-smtp, dbmail-user first. The problem is the need to stick to current storage. We can't even expand it by additional tables that are written but not used, until we replace dbmail-lmtp as well. I see two possible objections to this: (1) In principle, every step should lead to some useful improvement when at all possible. This will give it a userbase (and thus tester-base). In fact, even I can't become an active user before I get a quicker fetch. I can document and code, but not use, and therefore not *really* test. (2) The limitations of the current storage might creep into our base class (or base whatever) interface. We can work to avoid this, of course. So, here's another idea - replacing by functionality. We write the "put into database" part first. And we make all current code use that instead (more on that later, see [*]). Now we can expand the storage as soon as it's backwards compatible, so the reading/searching parts can read the data. Then we write the "fetch/search from database" part, and have the code interface with that. At this point, although the daemons are not yet rewritten, the thing starts to work faster. Then we take a break, stabilize the interfaces, test performance in various cases - very important. The result should, ideally, be another stable version. Old daemons, new storage. After that we have a stable storage engine, with a well defined interface, and some form of interoperation with C code as a bonus. Then we can think what to do about the daemons. Perhaps they're good enough to keep! Or perhaps we should do a Python rewrite. Or, bind into Dovecot or Cyrus or whatever instead. [*] On the matter of interoperation. There is a standard C/Python binding interface. But I'm not sure it will always be the best solution, at least for mail receiving - it might lead to process spawning like dbmail-smtp. This makes things slower, while we want them to be faster with every step. So we need to ensure that no new process is spawned, at least when the same process uses the store part several times, and ideally also with different calling processes - this will help us keep the db connection, and dbmail-smtp will finally be fast. One of the possible ideas would be implementing an extended dbmail-lmtpd, and make both dbmail-smtp and dbmail-imap use that for storing messages. It's a brainstorm, so if some of these ideas get dumped, no problems :) As soon as the main idea - to gradually transform dbmail into a DB mail storage engine that is fast, well-written, well-documented, and well-accessible for other programs - stands, any method may be good. Yours, Mikhail Ramendik