> If I don't take it into account when optimizing the scores, then the
> increased scores will cause more false positive errors.

What you probably need to take into account is the cumulative score for the
rule in the test corpus.  Which of course you do for all rules.  The only
oddity you would have to account for is you would have to know the effective
multipler that was used on the base score in the corpus, so that you could
set the base score correctly.  I think that is a piece of data that you
probably don't have currently for rules, since it is implicitly 1.0.

Of course, if you wanted to tune the multipler and min/max hits values for
the rule it becomes vastly harder.  I think that might be best left to some
other hunk of corpus-analysis software that generates some statistics for
multiple-hit rules and lets a human make the final decisions.  There would
be few enough of this sort of rule to make that feasible.  Once those values
were set by a human it would be easy enough to factor in the effective
multiplier on the test corpus when setting the score for the rule.

        Loren

Reply via email to