On 14 Jan 2004 at 14:12, David F. Skoll wrote:

> I'm testing a SpamAssassin rule that looks like this:
> 
> body GIBBERISH_WORDS        /([a-z]{4,12} ){15}/
> describe GIBBERISH_WORDS    A whole bunch of space-separated lowercase words
> score GIBBERISH_WORDS       2
> 
> It looks for 15 or more all-lower case words between 4 to 12 letters
> each separated by single spaces.  I don't think that occurs often in
> real e-mail -- most people do use upper-case letters and punctuation
> ocasionally!  But use at your own risk, especially if you correspond
> with an e. e. cummings fan.

I'm using these rules, as per a recent discussion on SA-Talk.  
Filtering out some common four-letter words tends to drop the FP 
rate.  So far, they're working really well.  

body        RANDOMWORD_10   
/(?:\b(?!(?:from|even|more|were|with)\b)[a-z]{4,12}\s+){10}/
describe    RANDOMWORD_10   String of 10+ random words
score       RANDOMWORD_10   0.5
body        RANDOMWORD_15   
/(?:\b(?!(?:from|even|more|were|with)\b)[a-z]{4,12}\s+){15}/
describe    RANDOMWORD_15   String of 15+ random words
score       RANDOMWORD_15   2.5

----
Nels Lindquist <*>
Information Systems Manager
Morningstar Air Express Inc.

_______________________________________________
Visit http://www.mimedefang.org and http://www.canit.ca
MIMEDefang mailing list
[EMAIL PROTECTED]
http://lists.roaringpenguin.com/mailman/listinfo/mimedefang

Reply via email to