Hi folks,

What do I need to do to gain commit access? I sent in the signed Apache CLA a few weeks ago but I heard nothing back.

My plans initially are only to put new tests into the sandbox to see how they do.

* Get Adam Katz's KHOP rules updated in the sandbox so they can be properly tested.

* Sandbox testing of additional blacklists like JMF, SEM

* Split PSBL into sub rules. RCVD_IN_PSBL is currently looking at all headers instead of just last-external. This can work very well. But I believe there is a simple way to improve this furter by splitting it into two subrules. This change can be made after the GA rescoring if the rule is split properly.

Use RCVD_IN_PSBL_2WEEKS to assign a score. RCVD_IN_PSBL_DEEP would be te equivalent to RCVD_IN_PSBL_2WEEKS. The stricter RCVD_IN_PSBL would be a subrule that matches only with last-external, thereby being stricter and eliminating most of the already mininuscule chance of false positives. Thus the full score of RCVD_IN_PSBL_2WEEKS would be split into two parts.

Before
RCVD_IN_PSBL_2WEEKS score 2
This rule does deep parsing which is often good, but sometimes bad.

After
RCVD_IN_PSBL score 2
This rule matces only last-external making it safer from FP's.
RCVD_IN_PSBL_DEEP score -1
This rule is can be scored separately, subtracting a tiny amount if the PSBL hit was found in deep parsing. Both rules would trigger, one adds, the second subtracts. The subtracting rule would never fire on its own.

* I am also looking at ways to expand the use of the SOUGHT methodology. Either improve the existing SOUGHT, or launch a separate SOUGHT-like channel based upon an entirely different corpus. For example, Japanese spam trap corpus + Japanese ham corpus = SOUGHT-JP nightly sa-update channel. I'm even seeing big spam differences between jm's corpus generated sought rules and my own corpus. There is room for improvement with the current SOUGHT.

Warren Togami
[email protected]

Reply via email to