On 01.03.2012 16:36, Vlad Sedov wrote: > Ha! I realized I've asked a bunch of vague questions after I sent my > first message... :] > LOL!
> My setup is fairly straight-forward. We have a virtual domain > environment with a pretty wide range of users. Some are trigger-happy > when they mark spam, and others complain when their precious coupons > are getting tagged. Aka: normal users. > The plan is to use dspam/clamav daemonized, in between postfix and > dovecot. Good choice. > I was thinking about using maildrop to deliver tagged mail to the > customers' SPAM folders, and let them use the spam/not spam buttons in > webmail to correct false positives and missed spams. This is basically > how I'm doing it now with qmail. I would advice against this. Maildrop would add an additional dependency for nothing. Rather than using maildrop I would suggest you to look at sieve. And if you like buttons then why not taking it to the next level and use something like the dovecot antispam plugin. That plugin allows you to offer the same functionality without the need of buttons. You can do all of that just by dragging and dropping messages from/out of the spam folder. > > I'm still a bit in the dark about the tokenizers. I've read several > dspam HOW-TOs, and it seems everyone has their preference, but they > don't explain why. This is often a problem with how-to's. They help you to get something working but they lack the profound knowledge and explanation why to do something. Anyway... if you want to understand tokenizers than have a look here -> http://sourceforge.net/apps/mediawiki/dspam/index.php?title=Tokenizers This should help you understand what each tokenizer is producing. > For us, speed and optimal disk usage are more important. > Usually the HASH driver is the fastest since it is basically pure memory access. In your case I would use either MySQL or PostgreSQL. They scale way better than the HASH driver and a well designed cluster with a RDBMS can beat the HASH driver in every aspect. > As for storage drivers, my main concern is long-term performance. From > what I understand though, I pretty much have to pick an SQL driver if > I want to run dspam daemonized. HASH can be used too in daemonized mode. > I've experimented with MySQL and Postgres, and it seems MySQL is quite > a bit faster (with InnoDB), but the database files were growing out of > control pretty quickly. After about a week, the database was taking up > 24GB. I was running the purge script nightly, but it didn't seem to > make much difference. 24GB after a week? How much is your mail inbound per day/week? Can you post your dspam.conf and some other data how and what you have setup? > Is it typical for DSPAM to use so much space? Depends how much users you have, how much mail flow you have, how big the messages are, how your configuration looks like, etc... > Does it taper off after a while? Yes! > I have a 120GB SSD array just for DSPAM's database, so I have to make > sure that I'm not going to run out of space after a few weeks. > > > Thanks again, > > Vlad > -- Kind Regards from Switzerland, Stevan Bajić > > > On 2/29/2012 10:46 PM, Stevan Bajić wrote: >> On 01.03.2012 00:00, Vlad Sedov wrote: >>> Hey folks. >> Hello Vlad, >> >> >>> I'm migrating my qmail/spamassassin mail server with about 3K mailboxes >>> to DSPAM/postfix. Our small-scale tests showed that DSPAM, with >>> sufficient training, was near flawless comparing to SpamAssassin's >>> lousy >>> 85-90% success rate. >>> >>> So here are a few questions... >>> >>> How much disk space should the SQL database use, with proper daily >>> purging? >> this is difficult to answer and depends on what tokenizer you are using >> and how much mail users are getting and how diverse that mail is. >> >>> What is the most efficient dspam storage driver? >> Efficient in what way? Speed? Space used? etc... >> >>> If you use MySQL, what's the best db storage engine to use? >> In your case I would suggest InnoDB. >> >>> What is the most effective tokenizer? >> Efficient in what sense? Speed? Amount of produced tokens? etc... >> >>> What about algorithms? It seems everyone has their favorite. >> Each of them has their benefit. Choose the one that is best for you. >> >> >>> Thanks in advance, >> One final word: If there would be something like the 'best' or the 'most >> efficient' setting/storage engine/algorithm/tokenizer/etc then you can >> trust us that we would remove all the other 'bad' or 'not efficient' >> setting/storage engine/algorithm/tokenizer/etc. But there is no such >> thing. >> >>> Vlad >>> >>> >>> _______________________________________________ >>> Dspam-user mailing list >>> Dspam-user@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/dspam-user >>> >> > > ------------------------------------------------------------------------------ Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ Dspam-user mailing list Dspam-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspam-user