Re[4]: Hackathon summary

Robert Menschel Mon, 25 Jul 2005 23:33:30 -0700

Hello Loren,

Monday, July 25, 2005, 9:55:36 PM, you wrote:


>> That's why we use 70_sare_name_eng.cf files, to indicate that these
>> rules work well only on systems which expect almost 100% English ham,
>> and little to no ham in other languages.

>> I've begun to wonder whether it might be worth while having
>> 50_scores.cf for English emails, and then 50_scores_de.cf for German
>> emails, and have SA pick the score appropriately depending upon the
>> language of the email...

LW> This is why I'd like to see a report-home option in SA that was enabled by 
default.

LW> We could invent a class of rules that were 'test rules'. 
LW> They would have nil score and wouldn't report on the mail summary
LW> if they hit.  But they would show up in the report-home summary is
LW> to whether they hit, and whether it was ham or spam.

How would we determine ham/spam?  At this point all we have is SA's
first estimation, and no way of knowing whether this is accurate, FN,
or FP.

More accurate would be to do this after human verification, but that
would greatly reduce the amount of feedback.

LW> Then we can make rules that pass initial testing and stick
LW> them out for what we believe is good use, or maybe even for pure
LW> testing purposes.  SA systems around the world would pick up these
LW> rules with sa-update, and would report home on the hit stats.  If
LW> we have a good hitter that sucks in 'de', then we move it to an
LW> english-only ruleset, or we have an exclude-de option on the front
LW> of the rule or rule grouping.  If the sysadmin has set his local
LW> language correctly, things should work out correctly.

The ideal sounds great to me.  It'd be really good to figure out how
to distribute "rules under consideration" around the world, and get
feedback on how they work in real life, before giving them a score.
The difficulty I see it is to determine just how they do work.

Bob Menschel

Re[4]: Hackathon summary

Reply via email to