On 10/30/2007 1:15 PM, Justin Mason wrote:
The JM_SOUGHT ruleset are body rules, extracted automatically from the
previous few days' trapped spam mail.  They typically hit about 90% of the
previous week's spam, with no FPs, according to
  http://ruleqa.spamassassin.org/20071029-r589545-n/JM_SOUGHT_1/detail
  http://ruleqa.spamassassin.org/20071029-r589545-n/JM_SOUGHT_2/detail
  http://ruleqa.spamassassin.org/20071029-r589545-n/JM_SOUGHT_3/detail

This is achieved with no manual steps required at all, so that's quite
nice ;)

On the other hand, they could potentially be used to cause false
positives; review of the generated rules happens *after* they're
published (in other words they're C-T-R).

I'm currently publishing these as a separate ruleset at
sought.rules.yerp.org -- http://taint.org/2007/08/15/004348a.html

They're also checked into SVN trunk, but that's really to get an idea of
FP/FNs using the rule-QA system.

I would call it stable.

I'm wondering what to do with them now -- I see these options:

  1. leave it at sought.rules.yerp.org, effectively an unofficial side
  project to SpamAssassin.

  2. move it into SpamAssassin SVN, and publish the generated rules into
  the "core" 3.2.x rule updates, changing our rule-update generation
  criteria to support this.
3. move it into SpamAssassin SVN, rename to something without the "JM"
  prefix, and publish the generated rules at a new URL like
  sought.rules.SpamAssassin.org .  This would then be the first of a new
  site of SpamAssassin-hosted add-on rulesets, which are free to use
  different promotion criteria from the default "core" set.

What do people think?

3. : +1

Notes:

- seems some of the rules are VERY long:

body __SEEK__CF3DC /Our Company is a privately owned and operated promoting and marketing firm based in UK, with offices all over the world\. We are currently expanding due to client needs\. We are looking for candidates that will assist us\. Now we offering positions at the entry level for marketing and management role\. We train all candidates in: Service Representative Promotions Communications Public Relations Marketing We value your goals and your career; so we will connect you with mentors who can offer you as much guidance as you need\. This is a permanent home based position, so anyone ready for a stable career should apply today\! /


or

body __SEEK_GSUNRB / Money Manager - GPS: Online Form \*Important information for former The Signature Citizens Internet Banking Users\* On November 1, we will be moving to a new Internet Banking system\. You will need to print any previous records \(statements, cancelled checks, Bill Pay information, etc\.\) you wish to retain since they will not move to the new service\. Your Internet Banking access will resume on Friday, November 2\. Previous merger with Signature Bank\'s parent company, Money Manager GPS, Completed on October 1, 2007\. Payments with a scheduled payment date of November 1 or before will be processed and should not be resubmitted\. Any payment scheduled for payment after November 1 will not be processed and other payment arrangements should be made\. If you previously had e-bills or payees setup with Bill Pay, Wire, Ach, etc\., you will need to re-apply for the service, and re-enter the bill payment information on the new system starting November 1\. Beginning on October 29, you can access the new Citizens Internet Banking system by clicking here: https:\/\/www\.citizensbankmoneymanagergps\.com\/ All information you provide to us on our web site is encrypted to ensure your privacy and security\./


- don't see the point of body rules containing short lived URIs

- reduce file size massively. Ppl could get surpised by memory used by adding an 80kb rule file (as well as possible noticeable speed issues)
(same SA list questions as with blacklist.cf .-)

- maybe rule generation run thru a spam_du_jour corpus only?

my 0.2 $preferred_currency

AXB





Reply via email to