http://bugzilla.spamassassin.org/show_bug.cgi?id=4095





------- Additional Comments From [EMAIL PROTECTED]  2005-01-24 11:19 -------
Subject: Re:  Using Bayesian Filters to score rules

I'm seeing significantly different results that what you have seen. When 
I exclude most of the message body - except for the "hot" parts - links, 
email addresses, phone numbers, and - I enhance the headers with some 
extra dns info - I'm seeing more accurate results.

The reason this works is because the difference in the body of the 
messages between spam and ham isn't as great as the parts of the 
messages I'm looking at.

Look at it this way - if I can use an analogy.

By excluding the body - it's like having a bart tub 1/3 full of very hot 
water. By including the body it's like having the bath tub full of warm 
water. The full tub might contain more total heat - but less 
temperature. And - I think temperature - not total heat - is the best 
way to detect spam accurately.

What I'm saying is that the bulk of the message body dilutes the 
bayesian results moving messages towards the center of the scale. 
Stripping out the bulk of the body makes the results move towards the 
ends of the scale.

And - getting back to the subject of this bug - I hope to be able to try 
replacing scores on rules with automatic bayesian scoring some time this 
week. I'll let you know how it does.





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to