On 04/12/14 01:00, Reindl Harald wrote:
>
> Am 03.12.2014 um 11:39 schrieb Steve Freegard:
>> If you're willing to spend the time to learn it and get your hands dirty
>> a bit, then DSPAM is well worth the effort IMO. It's way faster than
>> SpamAssassin (due to the bazillion network tests that SA does) and is a
>> considerably more advanced implementation of Bayes than SA
>
> the bazillion network tests (if dspam is really missing them) are a
> great benefit because you can't catch most fresh spam only by content
> nor rely on content analysis alone without whitelists for sane results
>

Yeah - but conversely you can't rely on network tests alone to catch 
everything.  Filtering spam effectively requires a layered approach and 
being smart about the costs (e.g. CPU, RAM, Bandwidth) of each layer 
.vs. the effectiveness of it.

> we achieve with SpamAssassin and wise postfix filters in front a 98%
> spam hit rate (meaning reject at smtp level) with below 10 false
> positives per month

So do I - except I'm not using Postfix or Postscreen; I'm using stuff 
that I wrote myself or helped to write (see 
http://www.github.com/baudehlo/Haraka).

Part of what I do is feed a sample of stuff that I'm going to be 
rejecting (due to other pre-DATA filtering) to DSPAM and have it 
classify it and if it has a confidence < threshold then I run an 
inoculation on the message, this is so that DSPAM always sees fresh spam 
which normally a problem with Bayesian implementations if you do a lot 
of rejects at the SMTP stage as I do.

I also train DSPAM using TONE (Train-on-near-error) so that if DSPAM 
isn't sure about a message <= 75% confidence, then I mark the message 
with [HAM?] or [SPAM?] and ask the user to train on the message 
accordingly.  Once trained, you see a couple of these a month or so and 
this keeps it very accurate.

> guess what - these below are most responsible by a reject score of 8.0
> combined with whitelists and BAYES_00 configure with a -3.0 score
>
> score URIBL_AB_SURBL 4.5
> score URIBL_JP_SURBL 4.5
> score URIBL_MW_SURBL 5.0
> score URIBL_PH_SURBL 5.0
> score URIBL_WS_SURBL 3.5
> score URIBL_SC_SURBL 0.5
> score URIBL_SBL 1.0
> score URIBL_SBL_A 1.2
> score URIBL_DBL_SPAM 3.0
> score URIBL_DBL_BOTNETCC 3.0
> score URIBL_DBL_PHISH 3.5
> score URIBL_DBL_MALWARE 3.5
> score URIBL_DBL_ABUSE_SPAM 3.0
> score URIBL_DBL_ABUSE_BOTCC 3.0
> score URIBL_DBL_ABUSE_PHISH 5.0
> score URIBL_DBL_ABUSE_MALW 5.0
> score URIBL_BLACK 6.5
> score URIBL_GREY 0.5
> score URIBL_RED 0.5
> score URIBL_DBL_REDIR 0.1
> score URIBL_DBL_ABUSE_REDIR 0.3
>

Cool - a DBL or URIBL_BLACK hit here is a reject (except where the DBL 
return value > 100); I don't even bother with SA/DSPAM for those.

I still use SA here; but with it's Bayes switched off and rules in place 
to use DSPAM results instead (if DSPAM doesn't score it enough to reject 
it outright) and with a load of custom rules and network tests that I 
wouldn't reject on but would score with.

Anyway - I'm not here to debate with you about how to filter stuff I'm 
merely saying that DSPAM is mature and the engine works well provided 
you're willing to spend the time to learn how it works and to get it 
running effectively.  It's definitely the best open-source Bayesian 
engine available based on it's features (e.g. networkable, SQL support 
etc.).

Kind regards,
Steve.


------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user

Reply via email to