https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6060
--- Comment #5 from Mark Martinec <mark.marti...@ijs.si> 2009-03-31 12:02:14 PST --- So far I survived by avoiding Perl 5.8.9 crashes (which was the current version in FreeBSD ports), and was running SpamAssassin under a hand-compiled 5.10.0. Few days ago the FreeBSD ports was updated to perl 5.10.0, and after upgrading, this same old 'Bus error' reared its ugly head again. So for the last two days I had to disable any use of berkeley db to survive. Seems the difference between my hand-compiled Perl 5.10.0 and the official one is that mine used 32-bit integers, while the one from ports used 64-bit integers, which causes the program to occupy somewhat more memory, and the compilation hits the stack limit sooner. So it seems the much heavier stack usage by a Perl compiler happened between 5.8.8 and 5.8.9, and the 5.10.0 is no better than 5.8.9 in this respect, as I initially thought. > Here is a count of calls to got_hit in the eval-ed code on our installation, > by type and priority (as given as arguments to eval in the current code): > [...] > rawbody_0.pm 431 > meta_500.pm 736 > head_0.pm 1855 > body_0.pm 5224 I repeated the exercise with body_0.pm, prunning it down to a size which the compiler still managed to compile without a crash, the limit is: body_0.pm 3268 rules. > Now that I think of it, a combined approach similar to use_rule_subs > would satisfy both needs: not give Perl too large chunks of code > to compile, and save on memory footprint in a parent process > (inherited by child processes) for source code, as Bug 5876 is > trying to solve. > > But instead of one rule per subroutine (as with use_rule_subs) > or all rules in one subroutine (as without use_rule_subs), > perhaps 100 rules per sub would be a good compromise, cleanly > satisfying both needs, and without going into complications > with temporary files. Not having much of a choice, I embarked on modifying the Check.pm plugin to implement the above idea: compile not more than about 60kB of source code at a time, and if necessary produce multiple subroutines, one for each chunk of code, and provide a master subroutine which calls each of the chunk subroutines in turn. A side effect is a noticable reduction in memory footprint. I modified the 'spamassassin' command to sleep just before finishing, and checked the process memory size by a 'ps' command. The set of rules is what I normally use on a production mailer (updates.spamassassin.org, sought.rules.yerp.org, some SARE rules and a handful of local ones). original 3.3 trunk: VSZ RSS 99692 91752 3.3 trunk with my modified Check.pm: 82080 79584 The reduction is 17.2 MB in virtual memory size, and 11.9 MB of resident memory size. As an experiment I also eliminated compiling of eval rules (substituted by direct calls), as it seems little point in optimizing the outer loop. I keep this diffs separate. It yields some additional memory reduction: 80780 78468 i.e. 1.3 MB VSZ and 1.1MB in RSS. It is not much, but makes code much simpler. I doubt there is any noticable performance penalty by not optimizing an outer loop, it is hard to measure with all the timing noise. Anyway, I'd like to bring in at least my chunking changes in Check.pm. It does pass the tests, and it does produce identical results on a couple of messages that I tried manually with old and new code (each with many and varied hits), and it does make our production mailer+SA run again with berkeley db enabled under 5.10.0. I would appreciate a second opinion on the approach and code, and any feedback in case I broke some corner case which I didn't try (user rules?, use_rule_subs?, mass checks?). Bug 6060: let the Check.pm plugin produce smaller chunks of source code (60 kB) to avoid Perl compiler crashing on exceeding stack size, and to reduce memory footprint of SpamAssassin. Sending lib/Mail/SpamAssassin/Plugin/Check.pm Committed revision 760568 ( https://svn.apache.org/viewcvs.cgi?view=rev&rev=760568 ). I can also commit the other half (direct execution of eval rules) - it can be reverted later if it doesn't feel right. -- Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug.