Hi folks,
What do I need to do to gain commit access? I sent in the signed Apache
CLA a few weeks ago but I heard nothing back.
My plans initially are only to put new tests into the sandbox to see how
they do.
* Get Adam Katz's KHOP rules updated in the sandbox so they can be
properly tested.
* Sandbox testing of additional blacklists like JMF, SEM
* Split PSBL into sub rules. RCVD_IN_PSBL is currently looking at all
headers instead of just last-external. This can work very well. But I
believe there is a simple way to improve this furter by splitting it
into two subrules. This change can be made after the GA rescoring if
the rule is split properly.
Use RCVD_IN_PSBL_2WEEKS to assign a score. RCVD_IN_PSBL_DEEP would be
te equivalent to RCVD_IN_PSBL_2WEEKS. The stricter RCVD_IN_PSBL would
be a subrule that matches only with last-external, thereby being
stricter and eliminating most of the already mininuscule chance of false
positives. Thus the full score of RCVD_IN_PSBL_2WEEKS would be split
into two parts.
Before
RCVD_IN_PSBL_2WEEKS score 2
This rule does deep parsing which is often good, but sometimes bad.
After
RCVD_IN_PSBL score 2
This rule matces only last-external making it safer from FP's.
RCVD_IN_PSBL_DEEP score -1
This rule is can be scored separately, subtracting a tiny amount if the
PSBL hit was found in deep parsing. Both rules would trigger, one adds,
the second subtracts. The subtracting rule would never fire on its own.
* I am also looking at ways to expand the use of the SOUGHT methodology.
Either improve the existing SOUGHT, or launch a separate SOUGHT-like
channel based upon an entirely different corpus. For example, Japanese
spam trap corpus + Japanese ham corpus = SOUGHT-JP nightly sa-update
channel. I'm even seeing big spam differences between jm's corpus
generated sought rules and my own corpus. There is room for improvement
with the current SOUGHT.
Warren Togami
[email protected]