GIBBERISHSUB will catch maybe 80% of this stuff with just 25 or so two character combinations. Some day I may add in some more strings slowly, but Declude's custom filtering environment wasn't designed for this type of thing.

Scott could build in functionality for counting characters, but it's much more difficult than just counting, he would first have to start decoding base64 and only match on decoded text for instance, and then there's the question of what to do with alphabetic character combinations in auto-generated codes. It's a bit kludgey no matter what way you approach it. Even a full-scale gibberish detector would have to have a lot of counterbalances and FP's would still occur, though one could match more than just 25 two letter strings.

Again, it's not the spam that this would catch which is at issue, it's the legit stuff that would FP. Maybe if you mapped out a more fool-proof method as a blueprint that might help. Bill's suggestion was an improvement, but it probably would have about the same overall results as GIBBERISHSUB because you would have to set a threshold high enough (say 5) so that it would miss some combinations of consonants, and it wouldn't likely hit merged words. I think that you'll find that an improvement would be quite complex, though certainly possible.

Matt



John Tolmachoff (Lists) wrote:

Examples:

UtahNlawydycn
daysOiwswvcm
HoustonGruqrb
1iving?Bnddddx
lfrmztzlvudgxulzhlc
ehrcbaarornrmnfpubke
Hereistheinfoyou
usefu1Nnputywatn

None were caught by GIBBERISHSUB.

John Tolmachoff
Engineer/Consultant/Owner
eServices For You




-----Original Message-----
From: [EMAIL PROTECTED] [mailto:Declude.JunkMail-
[EMAIL PROTECTED] On Behalf Of John Tolmachoff (Lists)
Sent: Friday, January 02, 2004 9:40 AM
To: [EMAIL PROTECTED]
Subject: RE: [Declude.JunkMail] CONSECUTIVECHAR test!

GIBBERSHSUB would not catch things like BestProductEver and
ImportantPleaseReadNow and so forth.

I have seen a number of spam where the words are run together without
spaces
to by pass filters. Being about to count consecutive characters and add a
weight of say nor more that 5 would help.

John Tolmachoff
Engineer/Consultant/Owner
eServices For You




-----Original Message-----
From: [EMAIL PROTECTED] [mailto:Declude.JunkMail-
[EMAIL PROTECTED] On Behalf Of Matthew Bramble
Sent: Friday, January 02, 2004 9:14 AM
To: [EMAIL PROTECTED]
Subject: Re: [Declude.JunkMail] CONSECUTIVECHAR test!

John,

This would FP on messages that include ID's in the subject such as
receipts, and also base64 encoded subjects, some of which are perfectly
valid and Declude doesn't decode subjects at this time.  I also tend to
see receipts with more characters than I tend to see in spam that
appends gibberish.

I don't think this could be made reliable without a good deal of error
detection.  GIBBERISHSUB actually would be a lot more reliable.

Matt




John Tolmachoff (Lists) wrote:




Test suggestion.

This would be like SUBJECTSPACES, instead would count consecutive


characters


other than spaces in the subject line.

CONSECUTIVECHAR consecutivechar 20


x


5 0

John Tolmachoff
Engineer/Consultant/Owner
eServices For You




---
[This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)]

---
This E-mail came from the Declude.JunkMail mailing list.  To
unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
type "unsubscribe Declude.JunkMail".  The archives can be found
at http://www.mail-archive.com.

Reply via email to