Josh Marshall wrote: > I'd like to point out a few things: > > * The added complexity of storing and synchronising files on disk with > records in tables, especially in a load-balanced, high available > situation, is much more work than any returns you'll ever get. > > * Whether the emails are stored in the database or the filesystem (which > is really just a database) is not going to be that much difference. > Databases can be a bit inefficient for space at times but this is > usually to increase speed. > > * I'd like to see 3 or 4 mailservers performing imap searches over an > NFS share to get to the mailbox files or messages. Then we can really > compare speed of the database vs filesystem in a networked environment > > * I'd like to see a system administrator easily recover all the emails > for a mailbox since the last time cleanup was performed. Hint: > update dbmail_messages set deleted_flag=0,status=0 where mailbox_idnr in > (select mailbox_idnr from dbmail_mailboxes where owner_idnr IN (SELECT > user_idnr from dbmail_users where userid='[email protected]')); > > * I'd like to see fine-grained point-in-time recovery for the > filesystem-based (or hybrid - scary) systems. Yes it would take a while > for any system depending on mail size. > > * I'd like to see impact-free daily backups for filesystem-based > systems. With dbmail, just have a slave replica you can pause > replication on to get a perfect snapshot, with no impact on the live > database during the backup duration. > > * Remember that with any mail system that has a huge amount of data, > things are going to take time. Databases have more records to search > through (although indexes can help speed this up). mbox are basically a > crude database storing all the emails in one file so large mailboxes can > take a very long time to work with. Maildir is good until the inbox gets > so many small files that just the directory listing takes a long time. > If you're going to have a large mail system, be aware that things will > take time, or use multiple systems and a system like perdition to split > up the mailboxes, or have an archive system for users to place old > emails they want to keep in. > > * As for mail delivery speed statistics, take them all with a grain of > salt. Our experience is the bottlneck for inbound mail is the antivirus > and antispam stage, and with the huge amount of spam hitting our servers > (90+% of all connections) it is actually faster to detect and reject > spam than have the mail deliver into the mailboxes. > > Finally: > > * Mail systems that don't require high-availability, failover or > networked environments for load balancing would probably be better to > just use mbox or maildir, for simplicity. For mission-critical systems > there are more items to consider before deciding which option to take. For what it's worth, I setup dbmail for my employer, and the only reason I chose it was that it was able to handle replication. Admittedly I abuse the system a little to do a master<->master replication, and that over the Atlantic Ocean. Current database size is 100G, 95G of which is the dbmail_messageblks table.
So yes, I know that there are advantages to this, and that there are
major upsides to a database. And I certainly wasn't suggesting NFS
(nightmare). But it would be interesting if we could have a replication
agent that could push mimeparts to disc. And fwiw, I didn't want to put
30,000 files into one folder, and was not recommending the use of
maildir. I more expected something like what Squid does, with 256x256
folders with mimeparts inside, indexed from their md5 or sha256 hashes.
At the same time, I'm not sure that that kind of replication is
practical. Maybe we need to instead make a MySQL engine that puts blobs
into files. But that's rather offtopic for the dbmail list.
signature.asc
Description: OpenPGP digital signature
_______________________________________________ DBmail mailing list [email protected] http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail
