There are a few approached on this one, which are going to depend on  
your user base, and the types of emails they get.  If you have a lot  
of users who get legitimate email newsletters, it can be harder.  If  
you have a lot of legitimate medical users, it gets even harder as you  
can not wholesale block medication names.

One option of course is to just send the email to ASSP for processing,  
so it can learn it is spam.  The spammers get more and more tactful  
and will change things up, as your example shows.

My first line of defense in these cases is looking at the headers, and  
seeing if I can ignore content, and find some other way to block them,  
or the range they are spewing from. I find more often than not, stuff  
like below comes from a /24 that you can block entirely, will have a  
ehlo/helo that is invalid, or some other header mark that is much more  
effective at not only blocking this one email, but many more that come  
from a close range of IP's.  Its also at the connection level, or  
early on, freeing up a ton of CPU for ASSP.  The second you get past  
the headers, ASSP has to work a lot harder.

You would be surprised how just making ehlo/helo blocks that are
ehlo/helo is your IP, or your IP range
ehlo/helo is yahoo.com
ehlo/helo is gmail.com
ehlo/helo is yourdomain.com
ehlo/helo is aol.com
ehlo/helo is hotmail.com
ehlo/helo is apple.com
.... etc etc, pick all your large email providers.
ehlo/helo is any domain you host
ehlo/helo does not contain "."
ehlo/helo contains pattern that is dynamic space

 From there, you are indeed looking at content.  In your case, you  
need to decode the email into a plain version of the content, which  
ASSP is going to do behind the scenes.  So your html will end up being:

<strong>Via<span style="FONT-SIZE: 2px; FLOAT: right; COLOR: white">  
qng </span>gra</strong>

* Basically, line ends of "=" go away, and the 3D goes away.

You have a few things that you can add as weighted patterns, that do  
not usually occur in normal email.

 > qng <

Give that pattern some weight, you do not want to block on it  
outright, but it is rare to come up in a legit email, and should count  
against any email using a string that is space padded and then has  
greater/less than characters around that.

 >\ ([a-zA-Z0-9]).*\ <

I *think* that would do it, and not give you any false positives.

The COLOR: white may not be a bad idea to also use, emails are  
generally white, and white on white is a tactic that may be used, hard  
to say without seeing the rest of the email.  Be careful with these  
patterns.

Finally, there are more complex patterns, and I fiddled with this for  
a minute and could not get it where I wanted, but it may be a good  
start. I would weight it a little on the low side.

(<(strong|STRONG|bold|BOLD)>)([Vv].+[Ii].+[Aa].+[Gg].+[Rr].+[Aa])(</ 
(strong|STRONG|bold|BOLD)>)

I can not remember if you can case-i in ASSP, if you can, that above  
can get a lot simpler.  The trouble is, that will match your line, but  
it would also match...

<BOLD>Viable<span style="FONT-SIZE: 2px; FLOAT: right; COLOR: white">  
qng </span>Greyhound Racing Association GRA</bold>

I have no idea how often "Viable Greyhound Racing Association GRA"  
shows up in emails though :)

Just be careful, as there is a bad word in the word "spe*cialis*t"

Hope that was of some help.
-- 
Scott * If you contact me off list replace talklists@ with scott@ *

On Dec 8, 2009, at 12:50 PM, aja-lists wrote:

> Does anyone have a good approach to block emails with these kind of
> patterns :
>
> <strong>Via<span style=3D"FONT-SIZE: 2px; FLOAT: right; COLOR: whit=
> e"> qng </span>gra</strong>


------------------------------------------------------------------------------
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
_______________________________________________
Assp-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/assp-user

Reply via email to