Hi list,

a little while ago I've set up a cute little mailfilter cluster, based
on Postfix, Amavisd-new & Co. Even if mail traffic is not that high
(peaks are slightly higher than 6 million delivery attempts a day) the
most important tasks where keep it as scalable as possible and make all
logging available to our support desk and possibly also to VIP customers
on a dedicated web interface.

I'm really happy with the result, each single component is there at
least twice. It doesn't matter what part is switched off (MX, Filter,
DB) - it always keeps running. And these days not only support desk
and VIP customers have "log access", each single account is able to
see all rejected or quarantined mail in their Webmail frontend.

My personal whish is to improve it even more, allowing me to scale
"globally". Right now there is a lot of traffic between Amavis Instan-
ces and their MySQL instances (Master-Master). In the current form
transactions and replication would not allow to scale very far.

Sure, we can add additional MySQL servers. We are able to do so on the
fly, already successfully tested it in production (150GB on each host,
stop connections to one of them, stop database there, copy it to a third
host, set up replication, logs and log position - and let both hosts
catch up. But even with 20 MySQL servers - the write load on each of
them contiues to grow, and traffic between them could be heavvy.

As I've managed it to solve the "web interface part" without using the
amavis database (I need it only for quarantine access by ID) by setting
up a central log aggregation system, there is only one task forcing me
to keep the default amavis database structre: Penpals. I consider
Penpals a really useful feature, and I won't miss it. However, while my
log system is designed to scale horizontally, the way how Amavisd's
database works forces me to keep that "centrical" approach.

I really hope I've been able to explain the problem ;-) And please don't
confuse this with horizontal database partitioning - that's something
I'm using in production since it's available. But it doesn't solve the
main issue, it just allows to scale "a little bit more".

You have a lot of options to scale your SMTP's inbound path (by domain,
by source IP, whatever) - however, you can not predict whether a reply
to such mail will travel through the same site. Therefore, currently ALL
sites need to be aware of ALL penpal information - and this currently
means that ALL sites are required to access the same database (even if
replicated and partitioned, it's still the very same DB - this way you
don't scale out far).

Here some possible steps / solutions I could immagine, just some
unsorted thoughts:

* Make Penpals modular, allow to use different (or even custom) modules
  for them
* Use a hash based on your mboxes for partitioning - this could also
  help with the way it currently works. However, then you are no longer
  able to use partitions for garbage collection. And combining mbox &
  time for hash computation once again doesn't allow to scale Penpals.
* Use Memcached - hashes based on MessageID and mbox allows you to
  always query the correct server. Memcached itself is not designed to
  be redundant, it's nothing but a cache. That's something I could live
  with - loosing part of my Penpals cache is not so critical. If you
  can't live with this: there are other similar implementations allowing
  to be redundant also here. Or you could use some locking-voodoo to
  make memcached also be redundant
* Allow to configure Amavis "storage" in a way it would store just
  quarantine and leave away the msg/adr/rcp part. Where to look for a
  specific quarantined mail is something I could discover in my log
  files - log-parsing-based systems are able to scale.
* I would not store to filesystem. Reason: garbage collection based on
  DB partitions is far cheaper. Allowing to "partition" your file-based
  quarantine by putting files in different subfolders (for example on a
  weekly base) could also be an option.

That's all so far. I'm pretty sure I've forgotten something - but I'm
confident one of you will for sure find it ;-) Your feedback is more
than welcome, and it doesn't need to be positive - feel free to tell me
why you consider the proposed approach braindead or whatever ;-p

Kind regards,
Thomas Gelf


------------------------------------------------------------------------------
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
_______________________________________________
AMaViS-user mailing list
AMaViS-user@lists.sourceforge.net 
https://lists.sourceforge.net/lists/listinfo/amavis-user 
 AMaViS-FAQ:http://www.amavis.org/amavis-faq.php3 
 AMaViS-HowTos:http://www.amavis.org/howto/ 

Reply via email to