Dave Watkins wrote:
>
> Hi Guys
>
> This might be a crazy idea, and I haven’t thought through all the 
> consequences yet but I thought I’d put it out there anyway.
>
> When building the Bayesian DB you ideally want a 1:1 ratio of 
> spam/ham, at least in my experience it’s fairly difficult to populate 
> the spam side of that to keep up with all the mail that’s getting 
> whitelisted and so my ratio is more like 0.6:1, the accuracy is still 
> very good so I’m not too concerned about it but I’d like to get it 
> closer to 1:1 if I can.
>
> With that in mind, would it make sense to use relay attempts to do 
> this, perhaps mail to non-existant users too? Obviously you don’t ever 
> actually want to accept the final message of the attempt or the 
> various automated testing systems will flag your server as a relay 
> host, but would you be able to get enough of the message to be useful 
> in the Bayesian DB? Maybe accept right up to the final piece of data 
> and then reject it? Also you may be able to use the PB data to 
> mitigate this so you only ever go down this road if the sending hosts 
> is already at a certain score in the PB, which should indicate that 
> the mail itself is in fact spam since normal mail servers shouldn’t 
> get into the PB anyway.
>
> Thoughts?
>
Personally... I use
Non Spam Collection Frequency

Store every n'th non spam message. If you set the value to 10 then every 
10th message is logged. These frequency settings are for ASSP users with 
a mature installation who experience heavy mail or spam volumes. Enter a 
larger value if the non spam corpus is being refreshed too quickly. 
Default Value = 1, log every message.
Spam Collection Frequency

Store every n'th spam message. The same as for non spam but helps 
prevent spam corpuses being skewed by flooding. It is recommended that 
this be set depending on spam volume. Default value = 1, log every message


I have the reverse of your problem. Lots of spam, little ham. You can 
always do a 3 non spam 1 spam to bring up your weight.


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Assp-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/assp-user

Reply via email to