The way we handle it in
http://www.pccc.com/downloads/SpamAssassin/contrib/KAM.cf is to use a
regex like /this.advertisement/ unanchored by \b.
When matching against phrases like yours, we find the word boundary does
not add any specificity to the rule because the odds of matching against
a different word or phrase is nil, and we catch almost every obfuscation
of word boundaries.
Good catch though, we do have some rules in KAM.cf that can be avoided
by this, and off the top of my head I can think of several stock SA
rules that are vulnerable too.
On 6/5/2014 9:44 PM, John Hardin wrote:
All:
I've run across a new text obfuscation method in active use by
spammers. It appears to be an attempt to bypass RE-based text matching
of words. Rules you write will need modification to not be spoofed by
this.
Unfortunately the RE engine considers the underscore as being a "word"
character, so a rule like /\bthis advertisement\b/ can be defeated by
replacing the spaces in the sentence with underscores. This is still
readable to a human but foils the word-boundary check.
Recommendation: instead of a bare \b, use (?:\b|_) and instead of
embedded spaces use [-_\s]
Examples:
Manage_advertising_preferences_here
To_remove_yourself_from_this_admail,_please_do_so_here
Be_removed_from_this_important_offer