On 04/12, Greg Troxel wrote:
> Do you mean rules like KHOP_DNSBL_BUMP and KHOP_DNSBL_ADJ?

I think so.

> The current score-setting algorithm seems to assume orthogonal rules, or
> rather a set of rules that test independent properties.  DNSBLs (and
> DNSWLs) are fundamentally different, because they are different entity's
> estimates of a single property.

Yep.

> If we force "not listed in any" to zero, sort of like rules not hittinng
> is zero score, then for 2 BLs we have 3 rules: A, B and A+B.  If A gets
> 2 points and B 1 and they largely overlap, then it seems very likely
> that A+B deserves 2.2ish rather than 3.  If one accepts the "score the

How about giving A+B 2, the greater of the values for A and B?

> I suggest adding infrastructure to declare a set of k scoring rules as
> non-independent, which has the effect of adding 2^k-k-1 joint-situation
> rules that can then be assigned scores different from the sum of the
> individual scores.  For k=3, one would need 7 rules total, and thus 4
> more (AB, AC, BC, ABC).

If we had sufficient mass-check participants, I agree that would probably
be optimal.  But it looks like we're dealing with k=15, so you're talking
about 32,752 more rules for 15 blacklists.  And about as many more for
whitelists.  Exponents can be a bitch.


So what do you think about adding the grouped-rule declaration, as you
suggested, but instead of creating many more rules, when scores are being
tallied for an email, only use the largest score hit out of any rule group?

Let those float in rescoring, the same way they're tallied, and the
blacklist (and whitelist) tests should end up with larger scores, since
they aren't forced to be lowered by overlap.  I bet a couple of them would
float over 5.

-- 
"Let's just say that if complete and utter chaos was lightning, then
he'd be the sort to stand on a hilltop in a thunderstorm wearing wet
copper armour and shouting 'All gods are bastards'." - The Color of Magic
http://www.ChaosReigns.com

Reply via email to