Hi Olivier, I'm focusing my SA time on getting 3.4.3 done but wanted to mentioned that this patch is getting into non-trivial territory. Could you please confirm or file an ICLA? It's awesome you are looking at improvements and this helps us consider them freely: https://www.apache.org/licenses/icla.pdf
Regards, KAM -- Kevin A. McGrail Member, Apache Software Foundation Chair Emeritus Apache SpamAssassin Project https://www.linkedin.com/in/kmcgrail - 703.798.0171 On Fri, Feb 8, 2019 at 1:34 PM Olivier Coutu <[email protected]> wrote: > You were quite right Henrick, I had insufficiently tested it and while it > computed dependencies only once, it did not properly retrieve previously > computed dependencies. > > I updated the patch: > > --- /usr/share/perl5/Mail/SpamAssassin/Conf/Parser.pm 2019-02-07 > 15:01:16.124532948 -0500 > +++ /usr/share/perl5/Mail/SpamAssassin/Conf/Parser.pm 2019-02-07 > 15:03:41.127692566 -0500 > @@ -968,24 +968,29 @@ > foreach my $name (keys %{$conf->{tests}}) { > next unless ($conf->{test_types}->{$name} > == $Mail::SpamAssassin::Conf::TYPE_META_TESTS); > - > - my $deps = [ ]; > - my $alreadydone = { }; > - $self->_meta_deps_recurse($conf, $name, $name, $deps, $alreadydone); > - $conf->{meta_dependencies}->{$name} = join (' ', @{$deps}); > + my $alreadydone = {}; > + $self->_meta_deps_recurse($conf, $name, $name, $alreadydone); > } > } > > sub _meta_deps_recurse { > - my ($self, $conf, $toprule, $name, $deps, $alreadydone) = @_; > + my ($self, $conf, $toprule, $name, $alreadydone) = @_; > > - # Only do each rule once per top-level meta; avoid infinite recursion > - return if $alreadydone->{$name}; > - $alreadydone->{$name} = 1; > + # Avoid recomputing the dependencies of a rule > + return split(' ', $conf->{meta_dependencies}->{$name}) if defined > $conf->{meta_dependencies}->{$name}; > > # Obviously, don't trace empty or nonexistent rules > my $rule = $conf->{tests}->{$name}; > - return unless $rule; > + unless ($rule) { > + $conf->{meta_dependencies}->{$name} = ''; > + return ( ); > + } > + > + # Avoid infinite recursion > + return ( ) if exists $alreadydone->{$name}; > + $alreadydone->{$name} = ( ); > + > + my %deps; > > # Lex the rule into tokens using a rather simple RE method ... > my $lexer = ARITH_EXPRESSION_LEXER; > @@ -1001,9 +1006,12 @@ > next unless exists $conf_tests->{$token}; > > # add and recurse > - push(@{$deps}, untaint_var($token)); > - $self->_meta_deps_recurse($conf, $toprule, $token, $deps, $alreadydone); > + $deps{untaint_var($token)} = ( ); > + my @subdeps = $self->_meta_deps_recurse($conf, $toprule, $token, > $alreadydone); > + @deps{@subdeps} = ( ); > } > + $conf->{meta_dependencies}->{$name} = join (' ', keys %deps); > + return keys %deps; > } > > sub fix_priorities { > > > *_meta_deps_recurse* now returns a hash that acts as a set of > dependencies. > > These dependencies are stored in *$conf->{meta_dependencies}->{$name}* as > they are calculated. > > The result is not exactly the same as it was before, as I had some cases > where duplicates in *$conf->{meta_dependencies}->{$name}* would arise. > This is no longer the case. The sets of dependencies stay identical. > > I also considered changing *$conf->{meta_dependencies}->{$name}* to a > list instead of a string, but that was not necessary to achieve a 4-fold > performance improvement and would require additional refactoring. > > What do you think? I am not used to writing Perl code so feel free to > correct the style. > On 19-01-08 11 h 24, Kevin A. McGrail wrote: > > Thanks Henrik! > > On 1/8/2019 11:21 AM, Henrik K wrote: > > The code is saving dependencies of each meta rule in this variable: > $conf->{meta_dependencies} > > That variable is used in Check.pm meta tests. > > Simply adding debug print after this line: > $conf->{meta_dependencies}->{$name} = join (' ', @{$deps}); > print STDERR "DEPS FOR $name: $conf->{meta_dependencies}->{$name}\n"; > > And running both versions with spamassassin --lint 2>/tmp/log1 and log2 > > Shows for example that original version: > > DEPS FOR __YOU_WON: __GIVE_MONEY __HAS_WON_01 __MOVE_MONEY __YOU_WON_01 > __YOU_WON_02 __YOU_WON_03 __YOU_WON_04 __YOU_WON_05 > > New "faster" version: > > DEPS FOR __YOU_WON: > > So it appears __YOU_WON has no dependencies, thus the patch does not work as > intended, Check.pm fails to check for dependencies, -1 from me. > > Contributions are very much appreciated, but I urge everyone to use very > simple debug methods like this to verify even "trivial" things. :-) > > Cheers, > Henrik > > On Tue, Jan 08, 2019 at 10:37:12AM -0500, Kevin A. McGrail wrote: > > It's an interesting issue and a trivial patch. I'm sure it was done as a > safety valve figuring the cycles weren't a big deal. I can't think of any > reason to keep checking all these meta dependencies over and over either. > > I'm +1 on this. > -- > Kevin A. McGrail > VP Fundraising, Apache Software Foundation > Chair Emeritus Apache SpamAssassin Project > [1]https://www.linkedin.com/in/kmcgrail - 703.798.0171 > > > On Tue, Jan 8, 2019 at 10:31 AM Olivier Coutu <[3][email protected]> > <[3][email protected]> > wrote: > > > We use a lot of custom rules (2k+) and lint time was growing out of > control, over a minute on some machines. > > It appears like $alreadydone, the set of rules that have already been > linted, is reset for every test. This makes lint time quadratic in the > number of tests and does not seem to have any use. Moving it out of the > tests loop reduces my lint time from 21 seconds to 3 seconds with no > visible side effect. > > This patch moves it out of the tests loop > > --- /usr/share/perl5/Mail/SpamAssassin/Conf/Parser.pm 2019-01-07 > 17:54:19.244624440 -0500 > +++ /usr/share/perl5/Mail/SpamAssassin/Conf/Parser.pm 2019-01-07 > 17:55:11.948708724 -0500 > @@ -964,13 +964,13 @@ > my ($self) = @_; > my $conf = $self->{conf}; > $conf->{meta_dependencies} = { }; > + my $alreadydone = { }; > > foreach my $name (keys %{$conf->{tests}}) { > next unless ($conf->{test_types}->{$name} > == $Mail::SpamAssassin::Conf::TYPE_META_TESTS); > > my $deps = [ ]; > - my $alreadydone = { }; > $self->_meta_deps_recurse($conf, $name, $name, $deps, $alreadydone); > $conf->{meta_dependencies}->{$name} = join (' ', @{$deps}); > } > > > The result on ok rulesets and on obviously broken rulesets does not change > from pre-patch, and I can't see any negative side effects of moving it out > of the loop. > > I'm not used to posting on dev lists, so please tell me if more > information > is needed or if things should be presented differently. > > -Olivier > > > References: > > [1] https://www.linkedin.com/in/kmcgrail > [3] mailto:[email protected] <[email protected]> > >
