Re: [Bug 7987] DNSEval.pm,HashBL.pm,URILocalBL.pm: unnecessary use of rule_pending and rule_ready

Michael Storz Sat, 14 May 2022 10:15:55 -0700

Am 2022-05-13 08:37, schrieb [email protected]:

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7987


Henrik Krohns <[email protected]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #9 from Henrik Krohns <[email protected]> ---
Feel free to discuss here or on dev-list if something is still unclear.

First of all, I would like to thank Henrik for all the work he has doneon SpamAssassin. The further I get with my code review, the more I seeall the places where he has made minor or major changes that have madethe code faster, more readable and thus more maintainable.

Despite all these changes, however, I still see room for improvement. Atthe moment I'm not very happy about how the do_meta_tests subroutine isimplemented. It seems to work now, but the query for the dependency ofthe meta rules looks too complicated to me:


foreach my $r (@{$md->{$rulename}|[]}) {
  next RULE unless exists $h->{$r} && !$tp->{$r} && !$pl{$r};
}

Instead of three dependencies, there should really only be one query atthis point for "is the rule ready or not". If we knew exactly the stateof a rule at each point in time, then it would also be possible to movefrom a brute force algorithm to a deterministic algorithm, where a rulethat becomes ready automatically sets all meta rules to ready if thatrule was the last dependency analogous to the algorithm for tags.do_meta_tests would then simply process a queue of ready meta rules. Ifa new meta rule is ready it will be added to the end of the queue. Ifthe queue is empty, all rules of a priority class are processed.

The next possible step would be the implementation of short-circuitbehavior of && and || analogous to Perl itself. E.g. the rule

meta BITCOIN_SPAM_10 __BITCOIN_ID && ( HTML_IMAGE_ONLY_04 ||HTML_IMAGE_ONLY_08 )

would immediately evaluate to false if __BITCOIN_ID is false.HTML_IMAGE_ONLY_04 and HTML_IMAGE_ONLY_08 would then no longer need tobe evaluated.

Recently the question was asked why Check.pm is a plugin if it is notoptional. Check.pm is a plugin so that you can implement more than onecheck plugin. Currently SpamAssassin still assumes that filteringhappens postqueue, where you can respond to the different wishes ofindividual users. If you use SpamAssassin in a prequeue filtering, thenonly the decision rejection or acceptance is possible for ALLrecipients. A consideration of different recipient wishes is notpossible.

Due to the possibility to switch between admin and user rules perevaluation of an email, one accepts that the evaluation of the rulesmust be rebuilt for each email. What we need instead is the possibilityto build the rules once at the start of the daemon and then use it toevaluate any number of emails.

You can go one step further and first run a SpamAssassin instance with alightweight ruleset in prequeue mode to be able to decide as quickly aspossible whether to reject an email and then run another instance with aheavyweight ruleset that makes a more precise distinction betweenmarking the email as spam or ham. The second instance can then alsoevaluate user rules. There should be a feedback loop between theinstances so that the first instance can use the results of the secondinstance, e.g. to quickly reject emails via a local blocklist using theHashBL.pm plugin. Currently we use a feedback loop between theSpamAssassin instance in prequeue mode and Postfix to (temporarily)reject emails for 24 hours.

In any case, SpamAssassin should arrive in 2022 and offer a highlyoptimized version of the analysis for MTAs that process millions ofemails per day.

These are my general remarks about the evaluation of rules. An emailwith some minor cosmetic changes will follow.


Michael

Re: [Bug 7987] DNSEval.pm,HashBL.pm,URILocalBL.pm: unnecessary use of rule_pending and rule_ready

Reply via email to