Re: [Dspam-user] Spam Identification Deteriorates with Time

Yan Seiner Fri, 20 Mar 2009 13:16:17 -0700

On Fri, March 20, 2009 12:42 pm, Chris Ryland wrote:
> But this might also make DSPAM too dependent on what SA things, no?
>
> I.e., maybe you'd be better off letting SA filter the truly obvious
> spam, and then letting DSPAM independently decide for the remaining
> emails.


I started to do that but I ended up with a lot of false positives; this is
more cumbersome but works better for me.

I end up with about a 99% accuracy with maybe 0.01% false positives.

On balance I'd rather have more spam get through than get false positives.

--Yan

>
> On Mar 20, 2009, at 2:58 PM, Yan Seiner wrote:
>
>>
>> On Fri, March 20, 2009 10:43 am, Chris Ryland wrote:
>>> Interesting--can you elaborate just a bit?
>>
>> OK, first mail passes through SA.  I have it configured to only add
>> info
>> in the X- headers.
>>
>> Then the mail passes through dspam.  dspam uses the info in SA's X
>> headers
>> as tokens in its decision.  So your email has the following headers:
>>
>> X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
>> selene.seiner.lan
>> X-Spam-Level:
>> X-Spam-Status: No, score=-2.8 required=5.0 tests=AWL,BAYES_00,
>>     DNS_FROM_RFC_BOGUSMX autolearn=no version=3.2.5
>> X-DSPAM-Check: by www.seiner.com on Fri, 20 Mar 2009 11:08:38 -0700
>> X-DSPAM-Result: Innocent
>> X-DSPAM-Processed: Fri Mar 20 11:08:39 2009
>> X-DSPAM-Confidence: 0.9995
>> X-DSPAM-Probability: 0.0000
>> X-DSPAM-Signature: 49c3dba742621804284693
>> X-DSPAM-Factors: 27,
>>     Cc*lists.sourceforge.net, 0.00010,
>>     wrote+>>, 0.00010,
>>     On+Fri, 0.00010,
>>     Subject*user], 0.00010,
>>     as+>, 0.00011,
>>>> +>>, 0.00013,
>>     wrote+>, 0.00015,
>>> +On, 0.00017,
>>     Cc*user, 0.00021,
>>     the+>, 0.00022,
>>     References*mail.gmail.com>, 0.00023,
>>     References*mail.gmail.com>, 0.00023,
>>     same+>, 0.00024,
>>     Cc*user+lists.sourceforge.net, 0.00024,
>>> +I, 0.00026,
>>> +>, 0.00026,
>>> +>, 0.00026,
>>     X-Mailer*Mail+(2.930.3), 0.00048,
>>     X-Mailer*(2.930.3), 0.00048,
>>     Mime-Version*v930.3), 0.00049,
>>     Mime-Version*framework+v930.3), 0.00049,
>>     References*www.datavault.us>, 0.00052,
>>> +Yan, 0.00053,
>>>> +Can, 0.00058,
>>> +the, 0.00061,
>>     38+PM, 0.00067,
>>     From*Chris, 0.00092
>>
>> Now let's look at a piece of junk:
>>
>> X-Spam-Flag: YES
>> X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
>> selene.seiner.lan
>> X-Spam-Level: ***********
>> X-Spam-Status: Yes, score=11.2 required=5.0 tests=AWL,BAYES_99,
>>
>> HTML_IMAGE_RATIO_04
>> ,HTML_MESSAGE,MIME_HTML_ONLY,RCVD_IN_XBL,URIBL_JP_SURBL,
>>     URIBL_RHS_DOB autolearn=no version=3.2.5
>> X-Spam-Report:
>>     * 1.5 URIBL_JP_SURBL Contains an URL listed in the JP SURBL
>> blocklist
>>     * [URIs: batiaceo.org]
>>     * 3.5 BAYES_99 BODY: Bayesian spam probability is 99 to 100%
>>     * [score: 1.0000]
>>     * 0.2 HTML_IMAGE_RATIO_04 BODY: HTML has a low ratio of text to
>> image
>> area
>>     * 0.0 HTML_MESSAGE BODY: HTML included in message
>>     * 1.5 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
>>     * 3.0 RCVD_IN_XBL RBL: Received via a relay in Spamhaus XBL
>>     * [64.18.137.4 listed in zen.spamhaus.org]
>>     * 1.1 URIBL_RHS_DOB Contains an URI of a new domain (Day Old
>> Bread)
>>     * [URIs: batiaceo.org]
>>     * 0.4 AWL AWL: From: address is in the auto white-list
>> X-DSPAM-Check: by www.seiner.com on Fri, 20 Mar 2009 11:42:38 -0700
>> X-DSPAM-Result: Spam
>> X-DSPAM-Processed: Fri Mar 20 11:42:39 2009
>> X-DSPAM-Confidence: 0.9997
>> X-DSPAM-Probability: 1.0000
>> X-DSPAM-Signature: 49c3e39f74883847820380
>> X-DSPAM-Factors: 15,
>>     X-Spam-Report*[URIs, 0.99990,
>>     X-Spam-Report*URL, 0.99990,
>>     X-Spam-Report*URL+listed, 0.99990,
>>     X-Spam-Report*1.5+URIBL_JP_SURBL, 0.99990,
>>     X-Spam-Report*URIBL_JP_SURBL, 0.99990,
>>     X-Spam-Report*URI+of, 0.99990,
>>     X-Spam-Report*an+URI, 0.99990,
>>     jpg"/>, 0.99990,
>>     X-Spam-Report*the, 0.99990,
>>     X-Spam-Report*3.5, 0.99990,
>>     X-Spam-Report*the+JP, 0.99990,
>>     X-Spam-Report*URIBL_JP_SURBL+Contains, 0.99990,
>>     X-Spam-Report*3.5+BAYES_99, 0.99990,
>>     X-Spam-Report*MIME_HTML_ONLY, 0.99990,
>>     X-Spam-Report*RCVD_IN_XBL+RBL, 0.99990
>>
>> you can see that almost all the tokens dspam used came from the X-Spam
>> headers.
>>
>> --Yan
>>
>>>
>>> On Mar 20, 2009, at 1:38 PM, Yan Seiner wrote:
>>>
>>>>
>>>> On Fri, March 20, 2009 9:55 am, Chris Ryland wrote:
>>>>> Very interesting, thanks.
>>>>>
>>>>> Can I ask what SpamAssassin adds to the mix?
>>>>
>>>> I use SA as input to dspam.  It allows dspam to be more accurate as
>>>> the
>>>> header tokens are nearly always the same.
>>>>
>>>> --
>>>> Yan Seiner, PE
>>>>
>>>> Support my bid for the 4J School Board
>>>> http://www.seiner.com
>>>>
>>>>
>>>
>>> Cheers!
>>> --Chris Ryland / Em Software, Inc. / www.emsoftware.com
>>>
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Yan Seiner, PE
>>
>> Support my bid for the 4J School Board
>> http://www.seiner.com
>>
>>
>
> Cheers!
> --Chris Ryland / Em Software, Inc. / www.emsoftware.com
>
>
> !DSPAM:49c3f1bb129321557312447!
>
>


-- 
Yan Seiner, PE

Support my bid for the 4J School Board
http://www.seiner.com


------------------------------------------------------------------------------
Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are
powering Web 2.0 with engaging, cross-platform capabilities. Quickly and
easily build your RIAs with Flex Builder, the Eclipse(TM)based development
software that enables intelligent coding and step-through debugging.
Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com
_______________________________________________
Dspam-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspam-user

Re: [Dspam-user] Spam Identification Deteriorates with Time

Reply via email to