https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6558
--- Comment #12 from Mark Martinec <[email protected]> 2011-03-22 16:13:00 EDT --- Matt Elson wrote on the mailing list a valuable description of the program flow - valuable for posteriority, so I'm including it here: Hey all, I've been doing some investigation of Bug 6558 (__PILL_PRICE_[1-3] + Compiled Rulesets == endless loop_ on my end and want to share the results - I'm not super familiar with SpamAssassin's code base, so apologies if I misread anything or am totally off. Long story short since this email has gotten a little bit lengthy, I think the problem lies in a function being created around line 97 in OneLineBodyRuleType that Rule2XSBody later uses in some cases. Lengthy analysis below: At this point, I have a test machine (x64) where I have removed *all rules* except the rules I'm testing and disabled every plugin except Rule2XSBody.pm and Check.pm. First, I've played around with the regexes and found that something as simple as: body LOCAL_TEST pill tflags LOCAL_TEST multiple will cause the problem (when run on the short artificial email attached to the bugzilla). Interestingly enough, if I make this case insensitive body LOCAL_TEST /pill/i tflags LOCAL_TEST multiple The problem goes away. So at that point I started poking around the code for Rule2XSBody because I was curious... and this is where I'm probably a bit out of my depth. But, it looks like the reason the case insensitive rule does not hit the problem is because the results of the CompiledRegexps scan is flagged as "non lossy" (l=0) and gets hits by the if statement around line 243 in Rule2XSBody.pm. Case sensitive rules are flagged as lossy (l=1) by the CompiledRegexps and have to move on. They get up to the stanza at line 261 - if (!&{$fn} ($scanner, $line) && $do_dbg).. and this is where things are getting stuck for me. This is where it got interesting - when I added in my debugging and ran through the original __PILL_PRICE_[1-3] rules that created it - they're all flagged as lossy. $fn seems to be a dynamically created function that Rule2XSBody (by way of OneLineBodyRuleType.pm) creates - unfortunately I can't quite decipher the code - line 142 in OneLineBodyRuleType.pm is where it's made. While I can't make out what the function's supposed to do, it is worth noting that when the rule it's being created for has a tflag of "multiple", the function has a while condition: i.e. while ($_[1] =~ '.$pat.'g) { Whereas if the tflag is NOT multiple, it's just an if condition if ($_[1] =~ '.$pat.') { I'm not quite sure what's supposed to break out of the while loop, but I'm fairly sure it's not getting correctly broken and is where everything's getting stuck. I changed the "while" to an if just to test this theory and once I do this.. the problem goes away for me, completely on all regexes, both my simple pill and the more elaborate original ones (and rewrites). I'd imagine not a real solution, but good for testing. (simple patch attached in case I was unclear about the change). This doesn't quite explain why the problem doesn't emerge for everyone using compiled rules (though maybe the difference is whether or not the CompiledRegexpsModule is flagging the rules as lossy; that might differ from architecture to architecture and environment to environment and when the rules are NOT lossy, they don't get to the bit of code that seems to be causing the problem). For further information, here's what the dynamic function function looks like when I spit it out with some debugging. sub JUST_PILLS_one_line_body_test { { pos $_[1] = 0; #line 1 "/var/lib/spamassassin/3.003001/local.cf, rule JUST_PILLS," while ($_[1] =~ /pill/g) { my $self = $_[0]; $self->got_hit(q{JUST_PILLS}, "BODY: ", ruletype => "one_line_body"); dbg("rules: ran one_line_body rule JUST_PILLS ======> got hit: \"" . ($&|| "negative match") . "\""); } } } (notice that that's the debug statement that you see repeated over and over; the comments before ${fn} is called suggest that this is running the real regex). Like I said, I'm having trouble making sense of it ($_ was never a friend of mine) and for the life of me I don't know how the loop is supposed to end. Another little hack I did that seems to fix it (though goodness knows at what cost) is to add an s at front (i.e. making it while $_[1] =~ s/pill/g). Again, not suggesting that as a real solution since modifying variables arbitrarily seems.. unwise, but maybe it will help troubleshoot/debug further. Anyway, hope this helps out! Matt [...] would something like this help with any performance degradation caused by my initial patch: my $line = \$_[1]; while ($$line =~ '.$pat.'g) { Or does it amount to more or less the same thing as my original patch? I'm not completely clear in how perl handles references and didn't really do anything more than a cursory test on it (which worked). Matt -- Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug.
