Am 2013-07-12 02:29, schrieb Andrew Crawford:
> Hello all,
>
> dspam seems to be spending a lot of time/effort tokenizing dates in my
> headers, at the expense of mail body content. For example, when I run
> dspam_admin [user], I get thousands and thousands of lines like this:
>
>> 14968969065089367403 S: 00001 I: 00000 P: 0.4000 LH: Thu Jul 11 20:12:19 2013
>> 963619933704143452 S: 00001 I: 00000 P: 0.4000 LH: Thu Jul 11 20:12:19 2013
>> 10739131833374758758 S: 00001 I: 00000 P: 0.4000 LH: Thu Jul 11 20:12:19 2013
>> 11081913058653856561 S: 00001 I: 00000 P: 0.4000 LH: Thu Jul 11 20:12:19 2013
>
> These dates are coming from headers like this:
>
>> Received: by li212-205.members.linode.com [1] (Postfix, from userid 115) id
>> CFB581CC6E4; Thu, 11 Jul 2013 20:12:23 -0400 (EDT)
>
> In my opinion, the first part of the Received-by header could potentially
> provide useful factors for spam/ham (so I don't want to ignore the whole
> header). But the date part does not.
>
> When I look at X-Dspam-Factors, the vast, _vast_ majority of the factors are
> dates. For example here is one:
>
>> 27, 1+https, 0.02321, 1+#+#+#+com, 0.02321, X-Greylist*li212-205+#+11,
>> 0.02999, X-Greylist*Thu+11, 0.02999, X-Greylist*11+#+2013, 0.02999,
>> X-Greylist*11+Jul, 0.02999, X-Greylist*at+#+#+11, 0.02999,
>> X-Greylist*postgrey-1.34+#+#+#+11, 0.02999, Received*sealedabstract.com
>> [2]+#+11, 0.03988, Received*drew+#+#+11, 0.03988, X-Greylist*Thu+#+Jul,
>> 0.04160, Received*for+#+#+#+11, 0.04212, Received*Thu+11, 0.04880,
>> Received*11+#+2013, 0.04880, Received*11+Jul, 0.04880, Date*11+Jul, 0.04938,
>> Date*11+#+2013, 0.04938, Date*Thu+11, 0.04938, 1+#+https, 0.04961,
>> 1+#+https, 0.04961, https+#+com, 0.05430, https+#+com, 0.05430, 10+2013,
>> 0.05430, Received*Thu+#+Jul, 0.06021, Date*Thu+#+Jul, 0.07052, to+#+#+You,
>> 0.07707, to+#+#+#+can, 0.08982
>
> Even though I receive this (ham) e-mail report every single day, with the
> same subject, and the same message body, only a tiny fraction of the spam
> score comes from the subject or message body. A great deal of it comes from
> dates.
>
> Is there some way to provide e.g. a regex to dspam so it will ignore date
> tokens? Or are many users ignoring the Received-by header and do not have
> this problem? How do others avoid this problem?
Searching the ML might help:
http://sourceforge.net/mailarchive/message.php?msg_id=25370411
http://www.mail-archive.com/dspam-user@lists.sourceforge.net/msg03003.html
> Best,
>
> Drew
>
> ------------------------------------------------------------------------------
> See everything from the browser to the database with AppDynamics
> Get end-to-end visibility with application monitoring from AppDynamics
> Isolate bottlenecks and diagnose root cause in seconds.
> Start your free trial of AppDynamics Pro today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
> [3]
>
> _______________________________________________
> Dspam-user mailing list
> Dspam-user@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/dspam-user
> [4]
--
Kind Regards from Switzerland,
Stevan Bajić
Links:
------
[1] http://li212-205.members.linode.com
[2] http://sealedabstract.com
[3]
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
[4] https://lists.sourceforge.net/lists/listinfo/dspam-user
------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user