Agreed, I'll certainly think of how to improve this and similar problems that crop up from time to time.
As mentioned, I'm wanting to get the masses working "as they should" at the minute (I think this is mostly done apart from nice rule rescores) and then going for improvements to the actual process and reliability. It's been this way for many years, change is needed - as is careful consideration. Enjoy the weekend!! Paul On Sun, 16 Jun 2019 at 09:08, Henrik K <[email protected]> wrote: > > I figured it does something like that, probably fine for most of those > rules > that don't hit much mail at all. Then we have stuff that hit 20%+ of ham > like STYLE_GIBBERISH, probably the rescorer should take that more into > account instead of just "crunching numbers". :-) It's not like the whole > world uses 5 as a baseline, people might also have all kinds of local > poison > pill rules. 8-10 seems quite ok to use and I remember some wiki page even > recommending that. > > > On Sun, Jun 16, 2019 at 08:42:57AM +0100, Paul Stead wrote: > > So let's look at the following rule which isn't promotable in QA: [1] > https://ruleqa.spamassassin.org/20190615-r1861371-n/URI_WP_HACKED_2/detail > > > > This has a publish tflag. > > > > Because of the publish tflag it is included in the active.list > > > > Because it's in the active.list it is considered for rescoring. > > > > When it is rescored, the iterative process scores against both ham and > spam in several thousand iterations for the rules from the rev# of that day. > > During these iterations the score that came out triggered minimal FPs > (ham mail > 5.0) and helped towards the spam score the best. > > > > The rescore seems to be doing the right thing in my opinion. > > It might show scores for rules that hit more ham than spam on the qa > site, but during the check of the corpus the score generated triggered > minimal emails hitting FPs. > > > > > > Paul > > > > > > On Sat, 15 Jun 2019 at 18:06, John Hardin <[2][email protected]> wrote: > > > > On Fri, 14 Jun 2019, Henrik K wrote: > > > > > PS. John, all these rules from your sandbox seem to have very > broken > > > scores, could you perhaps add informative scores to > > > [3]73_sandbox_manual_scores.cf for these? Atleast that method > should > > work > > > 100% for now.. > > > > > > FROM_IN_TO_AND_SUBJ 2.199 > > > OBFU_TEXT_ATTACH 1.699 > > > MIME_NO_TEXT 1.542 > > > AD_PREFS 1.399 > > > URI_WP_HACKED_2 1.304 > > > STYLE_GIBBERISH 1.111 > > > UC_GIBBERISH_OBFU 1.000 > > > LUCRATIVE 1.000 > > > HEXHASH_WORD 1.000 > > > FROM_WORDY 1.000 > > > AC_HTML_NONSENSE_TAGS 1.000 > > > LONG_HEX_URI 0.896 > > > FROM_PAYPAL_SPOOF 0.727 > > > > Not all of those are in my sandbox. For example, > AC_HTML_NONSENSE_TAGS is > > in KAM's. > > > > I spent some time today (which I did not have yesterday) to review > and > > update the tuning on many of those rules to improve their S/O. > > > > I also tried adding scores to [4]73_sandbox_manual_scores.cf for > them to > > suppress the net scores until those changes can be evaluated by the > weekly > > masscheck, but ran into a problem - see SA bug 7721. > > > > The tuning should minimize the problem from the stale net scores, so > I'm > > reluctant to alter their global scores, except for AD_PREFS, which > is a > > very simple rule that seems to be falling afoul of a lot of > "legitimate" > > marketing emails (i.e. actually subscribed to) in the masscheck ham > > corpora and thus can't really be tuned. > > > > > > -- > > John Hardin KA7OHZ [5] > http://www.impsec.org/~jhardin/ > > [6][email protected] FALaholic #11174 pgpk -a [7] > > [email protected] > > key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 > 2E79 > > > ----------------------------------------------------------------------- > > Are you a mildly tech-literate politico horrified by the level of > > ignorance demonstrated by lawmakers gearing up to regulate online > > technology they don't even begin to grasp? Cool. Now you have a > > tiny glimpse into a day in the life of a gun owner. -- Sean > Davis > > > ----------------------------------------------------------------------- > > 3 days until SWMBO's Birthday > > > > > > References: > > > > [1] > https://ruleqa.spamassassin.org/20190615-r1861371-n/URI_WP_HACKED_2/detail > > [2] mailto:[email protected] > > [3] http://73_sandbox_manual_scores.cf/ > > [4] http://73_sandbox_manual_scores.cf/ > > [5] http://www.impsec.org/~jhardin/ > > [6] mailto:[email protected] > > [7] mailto:[email protected] >
