On 04/12/14 01:00, Reindl Harald wrote: > > Am 03.12.2014 um 11:39 schrieb Steve Freegard: >> If you're willing to spend the time to learn it and get your hands dirty >> a bit, then DSPAM is well worth the effort IMO. It's way faster than >> SpamAssassin (due to the bazillion network tests that SA does) and is a >> considerably more advanced implementation of Bayes than SA > > the bazillion network tests (if dspam is really missing them) are a > great benefit because you can't catch most fresh spam only by content > nor rely on content analysis alone without whitelists for sane results >
Yeah - but conversely you can't rely on network tests alone to catch everything. Filtering spam effectively requires a layered approach and being smart about the costs (e.g. CPU, RAM, Bandwidth) of each layer .vs. the effectiveness of it. > we achieve with SpamAssassin and wise postfix filters in front a 98% > spam hit rate (meaning reject at smtp level) with below 10 false > positives per month So do I - except I'm not using Postfix or Postscreen; I'm using stuff that I wrote myself or helped to write (see http://www.github.com/baudehlo/Haraka). Part of what I do is feed a sample of stuff that I'm going to be rejecting (due to other pre-DATA filtering) to DSPAM and have it classify it and if it has a confidence < threshold then I run an inoculation on the message, this is so that DSPAM always sees fresh spam which normally a problem with Bayesian implementations if you do a lot of rejects at the SMTP stage as I do. I also train DSPAM using TONE (Train-on-near-error) so that if DSPAM isn't sure about a message <= 75% confidence, then I mark the message with [HAM?] or [SPAM?] and ask the user to train on the message accordingly. Once trained, you see a couple of these a month or so and this keeps it very accurate. > guess what - these below are most responsible by a reject score of 8.0 > combined with whitelists and BAYES_00 configure with a -3.0 score > > score URIBL_AB_SURBL 4.5 > score URIBL_JP_SURBL 4.5 > score URIBL_MW_SURBL 5.0 > score URIBL_PH_SURBL 5.0 > score URIBL_WS_SURBL 3.5 > score URIBL_SC_SURBL 0.5 > score URIBL_SBL 1.0 > score URIBL_SBL_A 1.2 > score URIBL_DBL_SPAM 3.0 > score URIBL_DBL_BOTNETCC 3.0 > score URIBL_DBL_PHISH 3.5 > score URIBL_DBL_MALWARE 3.5 > score URIBL_DBL_ABUSE_SPAM 3.0 > score URIBL_DBL_ABUSE_BOTCC 3.0 > score URIBL_DBL_ABUSE_PHISH 5.0 > score URIBL_DBL_ABUSE_MALW 5.0 > score URIBL_BLACK 6.5 > score URIBL_GREY 0.5 > score URIBL_RED 0.5 > score URIBL_DBL_REDIR 0.1 > score URIBL_DBL_ABUSE_REDIR 0.3 > Cool - a DBL or URIBL_BLACK hit here is a reject (except where the DBL return value > 100); I don't even bother with SA/DSPAM for those. I still use SA here; but with it's Bayes switched off and rules in place to use DSPAM results instead (if DSPAM doesn't score it enough to reject it outright) and with a load of custom rules and network tests that I wouldn't reject on but would score with. Anyway - I'm not here to debate with you about how to filter stuff I'm merely saying that DSPAM is mature and the engine works well provided you're willing to spend the time to learn how it works and to get it running effectively. It's definitely the best open-source Bayesian engine available based on it's features (e.g. networkable, SQL support etc.). Kind regards, Steve. ------------------------------------------------------------------------------ Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration & more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk _______________________________________________ Dspam-user mailing list Dspam-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspam-user