Hello all,
dspam seems to be spending a lot of time/effort tokenizing dates in my headers,
at the expense of mail body content. For example, when I run dspam_admin
[user], I get thousands and thousands of lines like this:
> 14968969065089367403 S: 00001 I: 00000 P: 0.4000 LH: Thu Jul 11 20:12:19
> 2013
> 963619933704143452 S: 00001 I: 00000 P: 0.4000 LH: Thu Jul 11 20:12:19
> 2013
> 10739131833374758758 S: 00001 I: 00000 P: 0.4000 LH: Thu Jul 11 20:12:19
> 2013
> 11081913058653856561 S: 00001 I: 00000 P: 0.4000 LH: Thu Jul 11 20:12:19
> 2013
These dates are coming from headers like this:
> Received: by li212-205.members.linode.com (Postfix, from userid 115) id
> CFB581CC6E4; Thu, 11 Jul 2013 20:12:23 -0400 (EDT)
In my opinion, the first part of the Received-by header could potentially
provide useful factors for spam/ham (so I don't want to ignore the whole
header). But the date part does not.
When I look at X-Dspam-Factors, the vast, vast majority of the factors are
dates. For example here is one:
> 27, 1+https, 0.02321, 1+#+#+#+com, 0.02321, X-Greylist*li212-205+#+11,
> 0.02999, X-Greylist*Thu+11, 0.02999, X-Greylist*11+#+2013, 0.02999,
> X-Greylist*11+Jul, 0.02999, X-Greylist*at+#+#+11, 0.02999,
> X-Greylist*postgrey-1.34+#+#+#+11, 0.02999, Received*sealedabstract.com+#+11,
> 0.03988, Received*drew+#+#+11, 0.03988, X-Greylist*Thu+#+Jul, 0.04160,
> Received*for+#+#+#+11, 0.04212, Received*Thu+11, 0.04880, Received*11+#+2013,
> 0.04880, Received*11+Jul, 0.04880, Date*11+Jul, 0.04938, Date*11+#+2013,
> 0.04938, Date*Thu+11, 0.04938, 1+#+https, 0.04961, 1+#+https, 0.04961,
> https+#+com, 0.05430, https+#+com, 0.05430, 10+2013, 0.05430,
> Received*Thu+#+Jul, 0.06021, Date*Thu+#+Jul, 0.07052, to+#+#+You, 0.07707,
> to+#+#+#+can, 0.08982
Even though I receive this (ham) e-mail report every single day, with the same
subject, and the same message body, only a tiny fraction of the spam score
comes from the subject or message body. A great deal of it comes from dates.
Is there some way to provide e.g. a regex to dspam so it will ignore date
tokens? Or are many users ignoring the Received-by header and do not have this
problem? How do others avoid this problem?
Best,
Drew
------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user