Huh!  The problem is the usual one: GIGO

Y  8 .../8bcaeebfaa ...,RCVD_IN_PBL,RCVD_IN_PBL,...

It's assumed that the rule list should have a unique set of names, so
hit-frequencies just adds the entry twice.

So now the question is: why does mass-check put the same rule in multiple
times, and apparently only for weekly runs, and apparently only for this
rule (pcregrep '([A-Z0-9_]+),\1(,|$)', shows only this rule duplicating)?
<sigh>


On Tue, Jul 03, 2007 at 12:02:12PM -0400, Theo Van Dinter wrote:
> On Tue, Jul 03, 2007 at 10:24:02AM +0100, Justin Mason wrote:
> > no "aha"s here unfortunately :( -- is this in your own local freqs,
> > or the freqs on the server (with everyone else's logs too)?
> 
> This is from hit-frequencies off of my net-theo weekly logs.
> 
> It's very reproducable too:
> 
> ~/SA/spamassassin-head/masses/hit-frequencies -a -c \
> ~corpus/SA/spamassassin-corpora/rules -x -p | awk \
> '$1 > 100 || $2 > 100 || $3 > 100'
> OVERALL    SPAM%     HAM%     S/O    RANK   SCORE  NAME
>       0   142976    25826    0.847   0.00    0.00  (all messages)
>        91.555  108.0930   0.0000    1.000   1.00    0.00  RCVD_IN_PBL
> 
> and doing a little bit of debugging yesterday, the spam count for that rule
> goes to 154547.  I just haven't figured out why yet though.
> 
> -- 
> Randomly Selected Tagline:
> "Our users will know fear and cower before our software! Ship it!
>  Ship it and let them flee like the dogs they are!"
>          - Klingon Programmer's Manual



-- 
Randomly Selected Tagline:
"I would never have sex with a cow.  Cause that is wrong, and I am
 lactose intolerant."            - Dave Attell

Attachment: pgpB1XAJ4janL.pgp
Description: PGP signature

Reply via email to