Re: [dspam-users] SOT: algorithm explanations

Steve Tue, 04 Dec 2007 13:13:46 -0800

-------- Original-Nachricht --------
> Datum: Tue, 4 Dec 2007 14:20:17 -0600
> Von: Jeffrey Taylor <[EMAIL PROTECTED]>
> An: [email protected]
> Betreff: [dspam-users] SOT: algorithm explanations


Hallo Jeffrey


> I like DSPAM, a lot, have been using it for several years with only one
> detected false positive (when I got a real Ebay account).  I like it so
> much,
> I am trying to use it in a new way, as a trained "interesting" ranker for
> an
> RSS reader.  It is not doing a very good job at this.  There are several
> reason for this.  I'd be tempted to write my own from scratch, but so much
> is
> already done, token parsing and persistent storage being the main ones. 
> It
> may be that simply using another algorithm might be the answer.  Is there
> a
> layman's description of them anywhere?  Or even a suggestion for a
> different
> algorithm?
> 
> The problems, AFAICT, are:
> 
> * Bayesian CLASSIFICATION, i.e., a binary spam/ham decision.  I need a
>   ranking, e.g. this is 79% interesting.  A resonably smooth and well
>   populated range of interesting to uninteresting scores.
> 
> * Formatting included in scoring, e.g. HTML tags and fragments.  3.8.0 is
> much
>   better in this regard than 3.6.X that I was previously using.  I have a
> way
>   around this, a 4 line patch to ignore a token if both innocent_hits and
>   spam_hits are zero.  And some utility scripts to double zero out
>   dspam_token_data.*_hits for user/admin specified tokens.
> 
> * Bias against false positives.  I think this can be solved by using the
>   processorBias attribute (remember, I am using DSPAM for spam filtering
> too,
>   so I can't use dspam.conf).  I think this is a new feature in 3.8.0 and
> very
>   welcome.
> 
I think you would be better served by CRM114 for this kind of task. Something 
like the hyperspace algorithm comes to my mind when reading your requirements.


> TIA,
>    Jeffrey

Steve

-- 
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! 
Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer

Re: [dspam-users] SOT: algorithm explanations

Reply via email to