Scott could build in functionality for counting characters, but it's much more difficult than just counting, he would first have to start decoding base64 and only match on decoded text for instance, and then there's the question of what to do with alphabetic character combinations in auto-generated codes. It's a bit kludgey no matter what way you approach it. Even a full-scale gibberish detector would have to have a lot of counterbalances and FP's would still occur, though one could match more than just 25 two letter strings.
Again, it's not the spam that this would catch which is at issue, it's the legit stuff that would FP. Maybe if you mapped out a more fool-proof method as a blueprint that might help. Bill's suggestion was an improvement, but it probably would have about the same overall results as GIBBERISHSUB because you would have to set a threshold high enough (say 5) so that it would miss some combinations of consonants, and it wouldn't likely hit merged words. I think that you'll find that an improvement would be quite complex, though certainly possible.
Matt
John Tolmachoff (Lists) wrote:
Examples:
UtahNlawydycn daysOiwswvcm HoustonGruqrb 1iving?Bnddddx lfrmztzlvudgxulzhlc ehrcbaarornrmnfpubke Hereistheinfoyou usefu1Nnputywatn
None were caught by GIBBERISHSUB.
John Tolmachoff Engineer/Consultant/Owner eServices For You
-----Original Message----- From: [EMAIL PROTECTED] [mailto:Declude.JunkMail- [EMAIL PROTECTED] On Behalf Of John Tolmachoff (Lists) Sent: Friday, January 02, 2004 9:40 AM To: [EMAIL PROTECTED] Subject: RE: [Declude.JunkMail] CONSECUTIVECHAR test!
GIBBERSHSUB would not catch things like BestProductEver and ImportantPleaseReadNow and so forth.
I have seen a number of spam where the words are run together without spaces to by pass filters. Being about to count consecutive characters and add a weight of say nor more that 5 would help.
John Tolmachoff Engineer/Consultant/Owner eServices For You
x-----Original Message----- From: [EMAIL PROTECTED] [mailto:Declude.JunkMail- [EMAIL PROTECTED] On Behalf Of Matthew Bramble Sent: Friday, January 02, 2004 9:14 AM To: [EMAIL PROTECTED] Subject: Re: [Declude.JunkMail] CONSECUTIVECHAR test!
John,
This would FP on messages that include ID's in the subject such as receipts, and also base64 encoded subjects, some of which are perfectly valid and Declude doesn't decode subjects at this time. I also tend to see receipts with more characters than I tend to see in spam that appends gibberish.
I don't think this could be made reliable without a good deal of error detection. GIBBERISHSUB actually would be a lot more reliable.
Matt
John Tolmachoff (Lists) wrote:
Test suggestion.characters
This would be like SUBJECTSPACES, instead would count consecutive
other than spaces in the subject line.
CONSECUTIVECHAR consecutivechar 20
5 0
John Tolmachoff
Engineer/Consultant/Owner
eServices For You
--- [This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)]
--- This E-mail came from the Declude.JunkMail mailing list. To unsubscribe, just send an E-mail to [EMAIL PROTECTED], and type "unsubscribe Declude.JunkMail". The archives can be found at http://www.mail-archive.com.