On 15/03/2010 7:13 PM, Karsten Bräckelmann wrote:
> On Mon, 2010-03-15 at 22:59 +0000, Justin Mason wrote:
>> 2010/3/15 John Hardin <jhar...@impsec.org>:
>>> On Mon, 15 Mar 2010, Karsten Bräckelmann wrote:
>>>
>>>> The following 30 rules appear to have NOT assigned a score in the
>>>> tarball. :(
> 
>>> I'd expect those sandbox rules to have their scores assigned by the nightly
>>> masscheck evaluation process. Daryl?
>>
>> as I said -- the rules tarball is being built from the 3.3 branch,
>> whereas the nightly evaluation process is running off trunk.  that's
>> why they're not matching.
> 
> I might be confused, but why would that result in rules without scores?
> Unless rules are removed from trunk, shouldn't it be the other way
> round?

The only script that pays any attention to the scores in 72_active.cf is
the script I've got running to publish the stable branch updates (which
also generates those scores).  The scores are based on a mass-check for
a specific svn revision.  I can't say that they'll be any good for
future, or past, revisions... who knows how the rules will change
between mass-checks (and score gen runs).

I suppose another script could try using those scores for another
revision of the rules, but user beware, changes in rules could make the
scores quite invalid.

>> so the question is: should we build the rules tarball from trunk as
>> well?  if so, what script should we use to do so?
> 
> Just a gut feeling, but shouldn't both be built from the branch?

Currently, for the update rule tarballs, I use the rules/ and rulesrc/
from trunk.  It's the only version of the rules that we have current
mass-check statistics for since we don't do stable version mass-checks.

Using the branch version of the rules/ directory would be OK, I guess.
I think that the only advantage might be more thorough review of changes
to rules in rules/.  However, since we've apparently opened up those
rules to CTR, it probably doesn't make a difference there either.  The
downside would be the stagnation that the stable branch always falls
into.  At some point nobody will bother to backport the rules and
they'll never get updated.

> Is the update tarball (like the nightly evaluation) built from trunk? In
> that case, the dist tarball probably should, too. It would be what the
> users get after an sa-update anyway...

Yep.

> But if we distribute off trunk in sa-update, why the distinction and
> need to backport sandbox rules in the first place?

We used to backport the *sandbox* rules because all the rule updates we
manual and trunk and branches were kept completely seperate.  That's no
longer the case.  Now that we're using trunk rules for stable branches
I'm not sure that we even need to backport any rules now.

> They definitely, absolutely need to match. Score generation with an
> alternated rule-set will skew results, if trunk-only rules are missing.

To be honest, and without looking at stats, I don't think that the
"skew" would be too drastic (towards the harmful side, anyway).  Without
scores the rules would get default scores of 1.0.  Most of the rules
have scores greater that 1.0, so without the scores those rules would be
safe.  A smaller set of new sandbox rules have scores less than 1.0...
those might drive the ham FP-as-spam rate up a little bit.  If anyone
cares enough I'm sure you could come up with numbers, however, since we
do have scores we can use I won't bother myself.

One things for sure, though, is that with the generated scores the new
rules will catch more spam since a bulk of them have higher than default
1.0 scores.

Daryl

Reply via email to