[Bug 5206] New: RFE: detect and merge duplicate rules for efficiency

bugzilla-daemon Fri, 24 Nov 2006 06:43:14 -0800

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5206


           Summary: RFE: detect and merge duplicate rules for efficiency
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Platform: Other
        OS/Version: other
            Status: NEW
          Severity: enhancement
          Priority: P5
         Component: Libraries
        AssignedTo: [email protected]
        ReportedBy: [EMAIL PROTECTED]


I just had a case where a meta rule in rulesrc/sandbox/jm depended on a meta
subrule which (it turned out) was in rulesrc/sandbox/dos.  (I thought it
was a released rule, but it wasn't ;)

Anyway, for cases like this, it would make sense to copy those "external"
dependencies alongside our meta rule, so they won't get "lost" or accidentally
deleted.  This is possible now, but it results in duplication -- both
the original dependency *and* our copy run, separately, even though they're
testing the same thing.

I suggest that for simple regexp rules (like body, header, rawbody, full),
we detect cases during parsing where the rule source is the same:

    header RULE_FOO     Foo =~ /bar/
    header RULE_BAZ     Foo =~ /bar/

and internally collapse those into one.  (an efficient way would be to mark
RULE_BAZ in a duplicates hashtable, e.g. "$conf->{duplicates}->{RULE_FOO} = [
RULE_BAZ ];" -- then when got_hit("RULE_FOO") is called, that automatically
fires "RULE_BAZ" too.)  This can be done during finish_parsing(), I'd say.

Since that happens at parse time, the only effects are internal; and if someone
later comes along and deletes the source line for RULE_FOO, RULE_BAZ simply
becomes its own rule with no dups and there's no visible change.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5206] New: RFE: detect and merge duplicate rules for efficiency

Reply via email to