There are a few approached on this one, which are going to depend on your user base, and the types of emails they get. If you have a lot of users who get legitimate email newsletters, it can be harder. If you have a lot of legitimate medical users, it gets even harder as you can not wholesale block medication names.
One option of course is to just send the email to ASSP for processing, so it can learn it is spam. The spammers get more and more tactful and will change things up, as your example shows. My first line of defense in these cases is looking at the headers, and seeing if I can ignore content, and find some other way to block them, or the range they are spewing from. I find more often than not, stuff like below comes from a /24 that you can block entirely, will have a ehlo/helo that is invalid, or some other header mark that is much more effective at not only blocking this one email, but many more that come from a close range of IP's. Its also at the connection level, or early on, freeing up a ton of CPU for ASSP. The second you get past the headers, ASSP has to work a lot harder. You would be surprised how just making ehlo/helo blocks that are ehlo/helo is your IP, or your IP range ehlo/helo is yahoo.com ehlo/helo is gmail.com ehlo/helo is yourdomain.com ehlo/helo is aol.com ehlo/helo is hotmail.com ehlo/helo is apple.com .... etc etc, pick all your large email providers. ehlo/helo is any domain you host ehlo/helo does not contain "." ehlo/helo contains pattern that is dynamic space From there, you are indeed looking at content. In your case, you need to decode the email into a plain version of the content, which ASSP is going to do behind the scenes. So your html will end up being: <strong>Via<span style="FONT-SIZE: 2px; FLOAT: right; COLOR: white"> qng </span>gra</strong> * Basically, line ends of "=" go away, and the 3D goes away. You have a few things that you can add as weighted patterns, that do not usually occur in normal email. > qng < Give that pattern some weight, you do not want to block on it outright, but it is rare to come up in a legit email, and should count against any email using a string that is space padded and then has greater/less than characters around that. >\ ([a-zA-Z0-9]).*\ < I *think* that would do it, and not give you any false positives. The COLOR: white may not be a bad idea to also use, emails are generally white, and white on white is a tactic that may be used, hard to say without seeing the rest of the email. Be careful with these patterns. Finally, there are more complex patterns, and I fiddled with this for a minute and could not get it where I wanted, but it may be a good start. I would weight it a little on the low side. (<(strong|STRONG|bold|BOLD)>)([Vv].+[Ii].+[Aa].+[Gg].+[Rr].+[Aa])(</ (strong|STRONG|bold|BOLD)>) I can not remember if you can case-i in ASSP, if you can, that above can get a lot simpler. The trouble is, that will match your line, but it would also match... <BOLD>Viable<span style="FONT-SIZE: 2px; FLOAT: right; COLOR: white"> qng </span>Greyhound Racing Association GRA</bold> I have no idea how often "Viable Greyhound Racing Association GRA" shows up in emails though :) Just be careful, as there is a bad word in the word "spe*cialis*t" Hope that was of some help. -- Scott * If you contact me off list replace talklists@ with scott@ * On Dec 8, 2009, at 12:50 PM, aja-lists wrote: > Does anyone have a good approach to block emails with these kind of > patterns : > > <strong>Via<span style=3D"FONT-SIZE: 2px; FLOAT: right; COLOR: whit= > e"> qng </span>gra</strong> ------------------------------------------------------------------------------ Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev _______________________________________________ Assp-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/assp-user
