Re: Help with RegEx Rule

2015-10-09 Thread Bowie Bailey

On 10/9/2015 12:07 AM, AK wrote:

On 20/09/15 03:07, Dave Funk wrote:


Notes:
1) Due to SA pre-processing collapsing body into one long line, 
cannot match on '^' repeatedly, need to look for '\n' as line break 
indicator.

Find start of a line and then following repeats of ".\n"


Dave,

I've been creating my own regular expressions (*with Regex Buddy 
despite some nay sayers here*) and they are working well. However, 
there are a few that seem to not hit on my test messages.  In order to 
troubleshoot further, I need to see the message the way SpamAssassin 
see's it; is there a tool that will let me convert my saved email (*I 
view the email source in Thunderbird and save to files for testing*) 
into the rawbody format so I can test my regex?


This sounds useful to me.  There have been a few cases where it would 
have been useful to me to see exactly what SA is doing to the message 
prior to checking the rule.


A utility which could output the text tested against for HEADER, BODY, 
RAWBODY, etc would be useful.  What might be more useful to those of us 
with a bit of Perl knowledge would be a function call that would return 
the strings so we can directly test the regex against the string with a 
small Perl program where we could run it with the debugger or capture 
and print parts of the match to make sure it's doing what we think it 
is.  Is this already possible with the API?


--
Bowie


Re: Help with RegEx Rule

2015-10-09 Thread John Hardin

On Fri, 9 Oct 2015, AK wrote:


On 20/09/15 03:07, Dave Funk wrote:


 Notes:
 1) Due to SA pre-processing collapsing body into one long line, cannot
 match on '^' repeatedly, need to look for '\n' as line break indicator.
 Find start of a line and then following repeats of ".\n"


Dave,

I've been creating my own regular expressions (*with Regex Buddy despite some 
nay sayers here*) and they are working well.  However, there are a few that 
seem to not hit on my test messages.  In order to troubleshoot further, I 
need to see the message the way SpamAssassin see's it; is there a tool that 
will let me convert my saved email (*I view the email source in Thunderbird 
and save to files for testing*) into the rawbody format so I can test my 
regex?


I've found the following useful in my test bench environment:

body __ALL_BODY /.*/
tflags   __ALL_BODY multiple

rawbody  __ALL_RAWBODY  /.*/
tflags   __ALL_RAWBODY  multiple

uri  __ALL_URI  /.+/
tflags   __ALL_URI  multiple

etc.

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Watch... Wallet... Gun... Knee...-- Denny Crane
---


Re: Help with RegEx Rule

2015-10-08 Thread Anthony Kamau

On 20/09/15 03:07, Dave Funk wrote:

Notes:
1) Due to SA pre-processing collapsing body into one long line, cannot 
match on '^' repeatedly, need to look for '\n' as line break indicator.

Find start of a line and then following repeats of ".\n"


Dave,

I need to see the mail message as spamassassin see's it so as to create some 
*awesome* rules; is there a tool that will let me convert my ASCII 
representation of an email message into a file in the rawbody format or any 
other format for that matter?


Thanks,
ak.




Re: Help with RegEx Rule

2015-10-08 Thread AK

On 20/09/15 03:07, Dave Funk wrote:


Notes:
1) Due to SA pre-processing collapsing body into one long line, cannot 
match on '^' repeatedly, need to look for '\n' as line break indicator.

Find start of a line and then following repeats of ".\n"


Dave,

I've been creating my own regular expressions (*with Regex Buddy despite some 
nay sayers here*) and they are working well.  However, there are a few that 
seem to not hit on my test messages.  In order to troubleshoot further, I need 
to see the message the way SpamAssassin see's it; is there a tool that will let 
me convert my saved email (*I view the email source in Thunderbird and save to 
files for testing*) into the rawbody format so I can test my regex?

I do not want to post the email nor the rule to avoid the watchful spammers!


Thanks,
ak.




Re: Help with RegEx Rule

2015-10-08 Thread Kevin A. McGrail

On 10/9/2015 12:07 AM, AK wrote:

On 20/09/15 03:07, Dave Funk wrote:


Notes:
1) Due to SA pre-processing collapsing body into one long line, 
cannot match on '^' repeatedly, need to look for '\n' as line break 
indicator.

Find start of a line and then following repeats of ".\n"


Dave,
I've been creating my own regular expressions (*with Regex Buddy despite some 
nay sayers here*) and they are working well.  However, there are a few that 
seem to not hit on my test messages.  In order to troubleshoot further, I need 
to see the message the way SpamAssassin see's it; is there a tool that will let 
me convert my saved email (*I view the email source in Thunderbird and save to 
files for testing*) into the rawbody format so I can test my regex?

I do not want to post the email nor the rule to avoid the watchful spammers!


Thanks,
ak.


AK,

Perhaps you'll have more luck looking at the debug output from SA 
itself?  Something like spamassassin -t -D < email.mbox 2>&1 | grep -i RULE


Regards,
KAM


Re: Help with RegEx Rule

2015-10-08 Thread AK

On 09/10/15 15:10, Kevin A. McGrail wrote:


Perhaps you'll have more luck looking at the debug output from SA 
itself?  Something like spamassassin -t -D < email.mbox 2>&1 | grep -i 
RULE



Nope, no luck there either; did not see mention of my rule (though it's located 
inside /etc/spamassassin/local.cf).  There are no lint errors either.  Despite 
the bad rep RegexBuddy has here with some users, the rules I build with it for 
my bash/perl scripts work well.  It's just that I have no idea how the email is 
transformed by spamassassin prior to processing; if I did, it would shed light 
on why one of my rawbody rules isn't working!

I'll keep digging with Google in the meantime.


Cheers,
ak.



Re: Help with RegEx Rule

2015-09-20 Thread AK

On 20/09/15 03:07, Dave Funk wrote:


Final note; now that we've discussed this spam sign, it will probably 
become useless as spammers follow this list and mutate their crap 
accordingly to dodge our rules. ;(




Awesome notes, Dave, thanks.

The tutorial really helped and it's all been added to my local wiki.

You are right - posting here will most likely help them change this one, 
but just this once as you've now taught me to fish - I'm a beggar no more!



Cheers,
ak.


Help with RegEx Rule

2015-09-19 Thread AK

Hi all.

I'm getting hit with lots of JUNK mail that has multiple lines with just 
a '.' on several lines [0].  Most of the JUNK email has at least 5 and 
at most 10 lines (so far) with just this '.' character somewhere in the 
middle of the message.


I've copied the message source to RegexBuddy [1] and have been able to 
come up with a regex that matches what I want using the Perl 5.20 engine:


(^\.\n){5,}

However, adding this rule to /etc/spamassassin/local.cf doesn't hit at 
all when I run it against my test message as follows:


= Start Rule Block =
rawbody __MANY_PERIODS_1 ALL =~ /(^\.\n){5,}/
meta MANY_PERIODS __MANY_PERIODS_1
score MANY_PERIODS 2.0
describe MANY_PERIODS JUNK mail with several lines that contain single dot
= End Rule Block =

= Begin Test Command =
spamassassin -L -t test.msg
= End Test Command =


Please help me understand what I'm doing wrong as this is my first 
attempt at creating a rule.  Previously I've just copied and pasted what 
I've found here in the forums, but this time I'm trying to do it myself 
but failing.



Regards,
ak.


[0] - http://pastebin.com/NwrwCKjZ
[1] - http://www.regexbuddy.com/create.html




Re: Help with RegEx Rule

2015-09-19 Thread Benny Pedersen

On September 19, 2015 4:52:30 PM AK  wrote:


= Start Rule Block =
rawbody __MANY_PERIODS_1 ALL =~ /(^\.\n){5,}/


remove ALL =~, my own rawbody rules dont have it


Re: Help with RegEx Rule

2015-09-19 Thread Bill Cole

On 19 Sep 2015, at 10:51, AK wrote:


Hi all.

I'm getting hit with lots of JUNK mail that has multiple lines with 
just a '.' on several lines [0].  Most of the JUNK email has at least 
5 and at most 10 lines (so far) with just this '.' character somewhere 
in the middle of the message.


I've copied the message source to RegexBuddy [1] and have been able to 
come up with a regex that matches what I want using the Perl 5.20 
engine:


(^\.\n){5,}

However, adding this rule to /etc/spamassassin/local.cf doesn't hit at 
all when I run it against my test message as follows:


= Start Rule Block =
rawbody __MANY_PERIODS_1 ALL =~ /(^\.\n){5,}/
meta MANY_PERIODS __MANY_PERIODS_1
score MANY_PERIODS 2.0
describe MANY_PERIODS JUNK mail with several lines that contain single 
dot

= End Rule Block =

= Begin Test Command =
spamassassin -L -t test.msg
= End Test Command =


Please help me understand what I'm doing wrong as this is my first 
attempt at creating a rule.  Previously I've just copied and pasted 
what I've found here in the forums, but this time I'm trying to do it 
myself but failing.


There are  multiple issues...

0. I have no basis to criticize RegexBuddy specifically but as a general 
principle, that class of tool is usually more of a hindrance than an aid 
for understanding what you're doing with regular expressions. If you're 
using SA for anything more than your personal email (i.e. if you're 
managing a mail system that uses SA) you really need to learn regular 
expressions well enough to write them yourself.


1. As Benny noted, the '=~' isn't used in rawbody or body rules. It is 
the Perl regex-match operator that is used in header rules between the 
name of the header to be checked and the regex to be matched. I think 
'spamassassin --lint' would have identified that as bogus, and it is 
always good practice to run that after adding new rules.


2. The 'meta' rule structure is pointlessly complex (but see (4) below.)

3. To match across multiple lines, you need the 'm' modifier.

4. You might find it more flexible to make the base rule match '^\.$' 
with a tflags setting of 'multiple' and set one or more meta rules for 5 
or more hits OR just make the base rule a normal rule with a score and 
let the multiple hits add up.




Re: Help with RegEx Rule

2015-09-19 Thread Adam Major
Hello

If you using compiled rules you probably should use:

sa-compile command and restart (if use :) sa-spamd


Best Regards.




Re: Help with RegEx Rule

2015-09-19 Thread AK

On 20/09/15 01:30, Benny Pedersen wrote:

On September 19, 2015 4:52:30 PM AK  wrote:


= Start Rule Block =
rawbody __MANY_PERIODS_1 ALL =~ /(^\.\n){5,}/


remove ALL =~, my own rawbody rules dont have it


Still no joy after removal.  However, at least the rule now hits if I 
replace:


/(^\.\n){5,}/

with

/(^\.\n)*/

But that looks like it might bring about some FPs.  Any other suggestions?


Regards,
ak.



Re: Help with RegEx Rule

2015-09-19 Thread Dave Funk

On Sun, 20 Sep 2015, AK wrote:


Hi all.

I'm getting hit with lots of JUNK mail that has multiple lines with just a 
'.' on several lines [0].  Most of the JUNK email has at least 5 and at most 
10 lines (so far) with just this '.' character somewhere in the middle of the 
message.


I've copied the message source to RegexBuddy [1] and have been able to come 
up with a regex that matches what I want using the Perl 5.20 engine:


(^\.\n){5,}

However, adding this rule to /etc/spamassassin/local.cf doesn't hit at all 
when I run it against my test message as follows:


= Start Rule Block =
rawbody __MANY_PERIODS_1 ALL =~ /(^\.\n){5,}/
meta MANY_PERIODS __MANY_PERIODS_1
score MANY_PERIODS 2.0
describe MANY_PERIODS JUNK mail with several lines that contain single dot
= End Rule Block =

= Begin Test Command =
spamassassin -L -t test.msg
= End Test Command =


Please help me understand what I'm doing wrong as this is my first attempt at 
creating a rule.  Previously I've just copied and pasted what I've found here 
in the forums, but this time I'm trying to do it myself but failing.



Regards,
ak.


SA does some interesting pre-processing on mail messages before applying 
rules, so you need to understand that.


Try this:

 rawbody T__LOCAL_MANY_PERIODS/\n(?:\.\n){5}?/
 describe T__LOCAL_MANY_PERIODS   Many lines with just a single "dot"

Notes:
1) Due to SA pre-processing collapsing body into one long line, cannot 
match on '^' repeatedly, need to look for '\n' as line break indicator.

Find start of a line and then following repeats of ".\n"
2) use '(?:' as grouping optimization unless you care about capture.
3) for terminal match clause use '{5}' not '{5,}' as we're done as soon
as we see at least 5 matches, don't care if there are more.
4) use "non-greedy" match quantifier '}?' look for first hit on that 
pattern and don't try to go for more.


Un-optimised pattern: /\n(\.\n){5}/

Note use of "testing" rule name format, that "T_". remove the leading 'T' 
to make it into a silent rule for combining with metas.


Personal convention; I interpolate '_LOCAL_' ( or '_L_') in locally 
created rule names to distinguish them for debugging. And then when things 
don't work as expected (EG: FPs) it helps to determine if the problem is 
self-inflicted.


Final note; now that we've discussed this spam sign, it will probably 
become useless as spammers follow this list and mutate their crap 
accordingly to dodge our rules. ;(


--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: Help with RegEx Rule

2015-09-19 Thread Dave Funk

On Sun, 20 Sep 2015, AK wrote:

[..snip..]
Still no joy after removal.  However, at least the rule now hits if I 
replace:


/(^\.\n){5,}/

with

/(^\.\n)*/

But that looks like it might bring about some FPs.  Any other suggestions?


Do you realize that rule will -always- fire on -any- message?
The '*' repeat operator is "zero or more" instances.
So that pattern degenerates to // which will match everything.

Guaranteed FP generator.

--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{