On Mon, Sep 28, 2009 at 06:44, Warren Togami <[email protected]> wrote:
> Hi folks, > > What do I need to do to gain commit access? I sent in the signed Apache > CLA a few weeks ago but I heard nothing back. > that's normal; we don't automatically create an account on CLA receipt, and we generally need the CLA much earlier than that (if the contributions have already been significant enough). I'll propose it to the pmc list and I'm pretty sure we'll be voting you in. as per http://wiki.apache.org/spamassassin/ProjectRoles , you haven't contributed enough bad code to do otherwise ;) --j. > My plans initially are only to put new tests into the sandbox to see how > they do. > > * Get Adam Katz's KHOP rules updated in the sandbox so they can be properly > tested. > > * Sandbox testing of additional blacklists like JMF, SEM > > * Split PSBL into sub rules. RCVD_IN_PSBL is currently looking at all > headers instead of just last-external. This can work very well. But I > believe there is a simple way to improve this furter by splitting it into > two subrules. This change can be made after the GA rescoring if the rule is > split properly. > > Use RCVD_IN_PSBL_2WEEKS to assign a score. RCVD_IN_PSBL_DEEP would be te > equivalent to RCVD_IN_PSBL_2WEEKS. The stricter RCVD_IN_PSBL would be a > subrule that matches only with last-external, thereby being stricter and > eliminating most of the already mininuscule chance of false positives. Thus > the full score of RCVD_IN_PSBL_2WEEKS would be split into two parts. > > Before > RCVD_IN_PSBL_2WEEKS score 2 > This rule does deep parsing which is often good, but sometimes bad. > > After > RCVD_IN_PSBL score 2 > This rule matces only last-external making it safer from FP's. > RCVD_IN_PSBL_DEEP score -1 > This rule is can be scored separately, subtracting a tiny amount if the > PSBL hit was found in deep parsing. Both rules would trigger, one adds, the > second subtracts. The subtracting rule would never fire on its own. > > * I am also looking at ways to expand the use of the SOUGHT methodology. > Either improve the existing SOUGHT, or launch a separate SOUGHT-like > channel based upon an entirely different corpus. For example, Japanese spam > trap corpus + Japanese ham corpus = SOUGHT-JP nightly sa-update channel. > I'm even seeing big spam differences between jm's corpus generated sought > rules and my own corpus. There is room for improvement with the current > SOUGHT. > > Warren Togami > [email protected] > > -- --j.
