Re: Help with RegEx Rule
On 10/9/2015 12:07 AM, AK wrote: On 20/09/15 03:07, Dave Funk wrote: Notes: 1) Due to SA pre-processing collapsing body into one long line, cannot match on '^' repeatedly, need to look for '\n' as line break indicator. Find start of a line and then following repeats of ".\n" Dave, I've been creating my own regular expressions (*with Regex Buddy despite some nay sayers here*) and they are working well. However, there are a few that seem to not hit on my test messages. In order to troubleshoot further, I need to see the message the way SpamAssassin see's it; is there a tool that will let me convert my saved email (*I view the email source in Thunderbird and save to files for testing*) into the rawbody format so I can test my regex? This sounds useful to me. There have been a few cases where it would have been useful to me to see exactly what SA is doing to the message prior to checking the rule. A utility which could output the text tested against for HEADER, BODY, RAWBODY, etc would be useful. What might be more useful to those of us with a bit of Perl knowledge would be a function call that would return the strings so we can directly test the regex against the string with a small Perl program where we could run it with the debugger or capture and print parts of the match to make sure it's doing what we think it is. Is this already possible with the API? -- Bowie
Re: Help with RegEx Rule
On Fri, 9 Oct 2015, AK wrote: On 20/09/15 03:07, Dave Funk wrote: Notes: 1) Due to SA pre-processing collapsing body into one long line, cannot match on '^' repeatedly, need to look for '\n' as line break indicator. Find start of a line and then following repeats of ".\n" Dave, I've been creating my own regular expressions (*with Regex Buddy despite some nay sayers here*) and they are working well. However, there are a few that seem to not hit on my test messages. In order to troubleshoot further, I need to see the message the way SpamAssassin see's it; is there a tool that will let me convert my saved email (*I view the email source in Thunderbird and save to files for testing*) into the rawbody format so I can test my regex? I've found the following useful in my test bench environment: body __ALL_BODY /.*/ tflags __ALL_BODY multiple rawbody __ALL_RAWBODY /.*/ tflags __ALL_RAWBODY multiple uri __ALL_URI /.+/ tflags __ALL_URI multiple etc. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Watch... Wallet... Gun... Knee...-- Denny Crane ---
Re: Help with RegEx Rule
On 20/09/15 03:07, Dave Funk wrote: Notes: 1) Due to SA pre-processing collapsing body into one long line, cannot match on '^' repeatedly, need to look for '\n' as line break indicator. Find start of a line and then following repeats of ".\n" Dave, I need to see the mail message as spamassassin see's it so as to create some *awesome* rules; is there a tool that will let me convert my ASCII representation of an email message into a file in the rawbody format or any other format for that matter? Thanks, ak.
Re: Help with RegEx Rule
On 20/09/15 03:07, Dave Funk wrote: Notes: 1) Due to SA pre-processing collapsing body into one long line, cannot match on '^' repeatedly, need to look for '\n' as line break indicator. Find start of a line and then following repeats of ".\n" Dave, I've been creating my own regular expressions (*with Regex Buddy despite some nay sayers here*) and they are working well. However, there are a few that seem to not hit on my test messages. In order to troubleshoot further, I need to see the message the way SpamAssassin see's it; is there a tool that will let me convert my saved email (*I view the email source in Thunderbird and save to files for testing*) into the rawbody format so I can test my regex? I do not want to post the email nor the rule to avoid the watchful spammers! Thanks, ak.
Re: Help with RegEx Rule
On 10/9/2015 12:07 AM, AK wrote: On 20/09/15 03:07, Dave Funk wrote: Notes: 1) Due to SA pre-processing collapsing body into one long line, cannot match on '^' repeatedly, need to look for '\n' as line break indicator. Find start of a line and then following repeats of ".\n" Dave, I've been creating my own regular expressions (*with Regex Buddy despite some nay sayers here*) and they are working well. However, there are a few that seem to not hit on my test messages. In order to troubleshoot further, I need to see the message the way SpamAssassin see's it; is there a tool that will let me convert my saved email (*I view the email source in Thunderbird and save to files for testing*) into the rawbody format so I can test my regex? I do not want to post the email nor the rule to avoid the watchful spammers! Thanks, ak. AK, Perhaps you'll have more luck looking at the debug output from SA itself? Something like spamassassin -t -D < email.mbox 2>&1 | grep -i RULE Regards, KAM
Re: Help with RegEx Rule
On 09/10/15 15:10, Kevin A. McGrail wrote: Perhaps you'll have more luck looking at the debug output from SA itself? Something like spamassassin -t -D < email.mbox 2>&1 | grep -i RULE Nope, no luck there either; did not see mention of my rule (though it's located inside /etc/spamassassin/local.cf). There are no lint errors either. Despite the bad rep RegexBuddy has here with some users, the rules I build with it for my bash/perl scripts work well. It's just that I have no idea how the email is transformed by spamassassin prior to processing; if I did, it would shed light on why one of my rawbody rules isn't working! I'll keep digging with Google in the meantime. Cheers, ak.
Re: Help with RegEx Rule
On 20/09/15 03:07, Dave Funk wrote: Final note; now that we've discussed this spam sign, it will probably become useless as spammers follow this list and mutate their crap accordingly to dodge our rules. ;( Awesome notes, Dave, thanks. The tutorial really helped and it's all been added to my local wiki. You are right - posting here will most likely help them change this one, but just this once as you've now taught me to fish - I'm a beggar no more! Cheers, ak.
Help with RegEx Rule
Hi all. I'm getting hit with lots of JUNK mail that has multiple lines with just a '.' on several lines [0]. Most of the JUNK email has at least 5 and at most 10 lines (so far) with just this '.' character somewhere in the middle of the message. I've copied the message source to RegexBuddy [1] and have been able to come up with a regex that matches what I want using the Perl 5.20 engine: (^\.\n){5,} However, adding this rule to /etc/spamassassin/local.cf doesn't hit at all when I run it against my test message as follows: = Start Rule Block = rawbody __MANY_PERIODS_1 ALL =~ /(^\.\n){5,}/ meta MANY_PERIODS __MANY_PERIODS_1 score MANY_PERIODS 2.0 describe MANY_PERIODS JUNK mail with several lines that contain single dot = End Rule Block = = Begin Test Command = spamassassin -L -t test.msg = End Test Command = Please help me understand what I'm doing wrong as this is my first attempt at creating a rule. Previously I've just copied and pasted what I've found here in the forums, but this time I'm trying to do it myself but failing. Regards, ak. [0] - http://pastebin.com/NwrwCKjZ [1] - http://www.regexbuddy.com/create.html
Re: Help with RegEx Rule
On September 19, 2015 4:52:30 PM AKwrote: = Start Rule Block = rawbody __MANY_PERIODS_1 ALL =~ /(^\.\n){5,}/ remove ALL =~, my own rawbody rules dont have it
Re: Help with RegEx Rule
On 19 Sep 2015, at 10:51, AK wrote: Hi all. I'm getting hit with lots of JUNK mail that has multiple lines with just a '.' on several lines [0]. Most of the JUNK email has at least 5 and at most 10 lines (so far) with just this '.' character somewhere in the middle of the message. I've copied the message source to RegexBuddy [1] and have been able to come up with a regex that matches what I want using the Perl 5.20 engine: (^\.\n){5,} However, adding this rule to /etc/spamassassin/local.cf doesn't hit at all when I run it against my test message as follows: = Start Rule Block = rawbody __MANY_PERIODS_1 ALL =~ /(^\.\n){5,}/ meta MANY_PERIODS __MANY_PERIODS_1 score MANY_PERIODS 2.0 describe MANY_PERIODS JUNK mail with several lines that contain single dot = End Rule Block = = Begin Test Command = spamassassin -L -t test.msg = End Test Command = Please help me understand what I'm doing wrong as this is my first attempt at creating a rule. Previously I've just copied and pasted what I've found here in the forums, but this time I'm trying to do it myself but failing. There are multiple issues... 0. I have no basis to criticize RegexBuddy specifically but as a general principle, that class of tool is usually more of a hindrance than an aid for understanding what you're doing with regular expressions. If you're using SA for anything more than your personal email (i.e. if you're managing a mail system that uses SA) you really need to learn regular expressions well enough to write them yourself. 1. As Benny noted, the '=~' isn't used in rawbody or body rules. It is the Perl regex-match operator that is used in header rules between the name of the header to be checked and the regex to be matched. I think 'spamassassin --lint' would have identified that as bogus, and it is always good practice to run that after adding new rules. 2. The 'meta' rule structure is pointlessly complex (but see (4) below.) 3. To match across multiple lines, you need the 'm' modifier. 4. You might find it more flexible to make the base rule match '^\.$' with a tflags setting of 'multiple' and set one or more meta rules for 5 or more hits OR just make the base rule a normal rule with a score and let the multiple hits add up.
Re: Help with RegEx Rule
Hello If you using compiled rules you probably should use: sa-compile command and restart (if use :) sa-spamd Best Regards.
Re: Help with RegEx Rule
On 20/09/15 01:30, Benny Pedersen wrote: On September 19, 2015 4:52:30 PM AKwrote: = Start Rule Block = rawbody __MANY_PERIODS_1 ALL =~ /(^\.\n){5,}/ remove ALL =~, my own rawbody rules dont have it Still no joy after removal. However, at least the rule now hits if I replace: /(^\.\n){5,}/ with /(^\.\n)*/ But that looks like it might bring about some FPs. Any other suggestions? Regards, ak.
Re: Help with RegEx Rule
On Sun, 20 Sep 2015, AK wrote: Hi all. I'm getting hit with lots of JUNK mail that has multiple lines with just a '.' on several lines [0]. Most of the JUNK email has at least 5 and at most 10 lines (so far) with just this '.' character somewhere in the middle of the message. I've copied the message source to RegexBuddy [1] and have been able to come up with a regex that matches what I want using the Perl 5.20 engine: (^\.\n){5,} However, adding this rule to /etc/spamassassin/local.cf doesn't hit at all when I run it against my test message as follows: = Start Rule Block = rawbody __MANY_PERIODS_1 ALL =~ /(^\.\n){5,}/ meta MANY_PERIODS __MANY_PERIODS_1 score MANY_PERIODS 2.0 describe MANY_PERIODS JUNK mail with several lines that contain single dot = End Rule Block = = Begin Test Command = spamassassin -L -t test.msg = End Test Command = Please help me understand what I'm doing wrong as this is my first attempt at creating a rule. Previously I've just copied and pasted what I've found here in the forums, but this time I'm trying to do it myself but failing. Regards, ak. SA does some interesting pre-processing on mail messages before applying rules, so you need to understand that. Try this: rawbody T__LOCAL_MANY_PERIODS/\n(?:\.\n){5}?/ describe T__LOCAL_MANY_PERIODS Many lines with just a single "dot" Notes: 1) Due to SA pre-processing collapsing body into one long line, cannot match on '^' repeatedly, need to look for '\n' as line break indicator. Find start of a line and then following repeats of ".\n" 2) use '(?:' as grouping optimization unless you care about capture. 3) for terminal match clause use '{5}' not '{5,}' as we're done as soon as we see at least 5 matches, don't care if there are more. 4) use "non-greedy" match quantifier '}?' look for first hit on that pattern and don't try to go for more. Un-optimised pattern: /\n(\.\n){5}/ Note use of "testing" rule name format, that "T_". remove the leading 'T' to make it into a silent rule for combining with metas. Personal convention; I interpolate '_LOCAL_' ( or '_L_') in locally created rule names to distinguish them for debugging. And then when things don't work as expected (EG: FPs) it helps to determine if the problem is self-inflicted. Final note; now that we've discussed this spam sign, it will probably become useless as spammers follow this list and mutate their crap accordingly to dodge our rules. ;( -- Dave Funk University of Iowa College of Engineering 319/335-5751 FAX: 319/384-0549 1256 Seamans Center Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527 #include Better is not better, 'standard' is better. B{
Re: Help with RegEx Rule
On Sun, 20 Sep 2015, AK wrote: [..snip..] Still no joy after removal. However, at least the rule now hits if I replace: /(^\.\n){5,}/ with /(^\.\n)*/ But that looks like it might bring about some FPs. Any other suggestions? Do you realize that rule will -always- fire on -any- message? The '*' repeat operator is "zero or more" instances. So that pattern degenerates to // which will match everything. Guaranteed FP generator. -- Dave Funk University of Iowa College of Engineering 319/335-5751 FAX: 319/384-0549 1256 Seamans Center Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527 #include Better is not better, 'standard' is better. B{