There's a decent way around that. You can set the test in the Config file for a solid weight, not score each filter test incrementally, and then provide a list of negative tests that would offset the test. So if there is some sort of ISO tagging of this Japanese stuff, you can find that code and defeat the test from running. Same goes for other languages.
I just got my first false positive out of 200 catches. This was from Korea but written in English (still encoded though). There are two clues in the headers as to how to defeat the test:
Subject: [22] =?euc-kr?B?R2VuZXJhbCBJbnF1aXJ5IGZvciBzbm93bW9iaWxl?= Content-Type: text/html; charset=euc-kr
You could probably do something like the following (suggested replacement for the original filter if you are using it):
GIBBERISHSUB filter C:\IMail\Declude\Filters\GibberishSub.txt x 5 0
# The following defeats the test if it finds the subject is not sent as ASCII
SUBJECT -5 CONTAINS ?b?
# Small list of letter combinations not found in a basic dictionary.
SUBJECT 0 CONTAINS qb SUBJECT 0 CONTAINS qc SUBJECT 0 CONTAINS qd SUBJECT 0 CONTAINS qe SUBJECT 0 CONTAINS qf SUBJECT 0 CONTAINS qg SUBJECT 0 CONTAINS qh SUBJECT 0 CONTAINS qi SUBJECT 0 CONTAINS qj SUBJECT 0 CONTAINS qk SUBJECT 0 CONTAINS qm SUBJECT 0 CONTAINS qn SUBJECT 0 CONTAINS qo SUBJECT 0 CONTAINS qp SUBJECT 0 CONTAINS qr SUBJECT 0 CONTAINS qs SUBJECT 0 CONTAINS qt SUBJECT 0 CONTAINS qv SUBJECT 0 CONTAINS qx SUBJECT 0 CONTAINS qy SUBJECT 0 CONTAINS qz
SUBJECT 0 CONTAINS vq SUBJECT 0 CONTAINS wq SUBJECT 0 CONTAINS tq SUBJECT 0 CONTAINS jq
SUBJECT 0 CONTAINS xd SUBJECT 0 CONTAINS xj SUBJECT 0 CONTAINS xk SUBJECT 0 CONTAINS xr SUBJECT 0 CONTAINS xz
SUBJECT 0 CONTAINS zb SUBJECT 0 CONTAINS zc SUBJECT 0 CONTAINS zf SUBJECT 0 CONTAINS zj SUBJECT 0 CONTAINS zk SUBJECT 0 CONTAINS zl SUBJECT 0 CONTAINS zm SUBJECT 0 CONTAINS zx
Matt
Dan Patnode wrote:
Follow-up,
Used in a high weight soft test, 3 of Q subject tests FPd this morning. It seems that Japanese encoded messages like lots of mixed up letters.
More testing...
Dan
On Wednesday, September 10, 2003 19:20, Dan Patnode <[EMAIL PROTECTED]> wrote:
I did a scan of all uncaught spam from the last week, found all the one's with Q, removed the QU's and ended up with this list. All of these would have been seen by Matt's new config:
Subject: Block those unwanted Popups yqvqk
Subject: drive luxury cars and get paid 9xP%oY5NzPG\q2G
Subject: drive luxury cars and get paid L0z[7J4aYq!F7P1
Subject: drive luxury cars and get paid 9xP%oY5NzPG\q2G
Subject: drive luxury cars and get paid L0z[7J4aYq!F7P1
Subject: FW: Block those unwanted Popups yqvqk
Subject: FW: drive luxury cars and get paid 9xP%oY5NzPG\q2G
Subject: FW: drive luxury cars and get paid L0z[7J4aYq!F7P1
Subject: FW: get that extra boost in the bed uvqtc qqyixu Subject: FW: new mail REgnfqnKQT
Subject: Fw: :( would u mind if i .. jqvmoiqfkzkokdwns u
Subject: get that extra boost in the bed uvqtc qqyixu
Subject: get that extra boost in the bed uvqtc qqyixu
Subject: Re: new mail REgnfqnKQT
Subject: Re: new mail REgnfqnKQT
Subject: Stop messages SPAM po p vyoaejswayqo
Subject: [Fwd:
=?GB2312?B?0OnE4r/VvOS089PFu92jrDE5OdSqv8nS1L2o0ru49s341b6jrA==?==?GB2312?B?uM+/7LW9d3d3LjA3NTVzei5jb23J6sfrsMld?=
Dan
On Wednesday, September 10, 2003 17:45, Matthew Bramble <[EMAIL PROTECTED]> wrote:
How about 4 different super tests? I fail automatically on =?ISO-8859-1?B?, and that accounts for more than 1% of the E-mail coming in to my server, but only a handful of additional catches in what was being missed...no false positives. I think I've mentioned enough times, the other tests that I would like to have...a BODYTEXT filter that searches just a decoded non-HTML body, a NOTEXT test for nothing but spaces and returns and attachments (that's a key) after decoding and de-HTMLifying, and a TEXTCOUNT marquee test that would allow you to search for amounts of non-HTML decoded body text just just like SUBECTSPACES and BCC, but in reverse (the less there is, the higher the score). I could catch so much crap with those 40 or so two character gibberish strings, in fact I think it was properly tagging around 10% to 20% of all unique incoming messages today if not more. That gibberish subject filter is tagging over 5% by itself, and with perfect accuracy so far. A functional gibberish body filter though would have a reasonable number of false positives (was tagging buy.com links that were shown in displayable text for instance). I don't of course though expect Scott to rush to my aid here.
I have managed to add though tests for SUBECTSPACES (very effective), COMMENTS (effective) and BCC (just ok), along with some small key word/phrase filters for the body, subject and sender with very good success. I only saw about 5 definitive false positives today out of around 3000 unique messages, but approximately 150 pieces of spam got through. I think that could be reduced by as much as half without a measurable impact on the false positives. If that doesn't work, I'm buying a gun :)
BTW, on Linux, my guru buddy recommends Postfix as the SMTP client and Webmin as the interface. I don't though dispute Sandy's faith in MS SMTP, and it can be run on the same box as IMail.
Matt
Dan Patnode wrote:
FYI, I pulled this test 3 weeks ago after a email from France came through (or rather didn't) with this subject:
Subject: =?ISO-8859-1?B?RW5qb3kgc3VtbWVyIHVudGlsIGl0cyB2ZXJ5IGVuZCE=?=
There's definitely is a correlation here among spammers, ?B? encoded subjects, disposable domain names, and nothing else in the body of the message. There has to be a way to bring the 2 or 3 variables togther as a super test.
Dan
On Monday, September 8, 2003 19:05, Matthew Bramble <[EMAIL PROTECTED]> wrote:
Use a text filter and add something like:
SUBJECT 40 CONTAINS =?ISO-8859-1?b?
to it.
I tried this all the way down to ust ?b? and a SUBJECT filter didn't catch it. The SUBJECT filter also doesn't catch the decoded text.
I found though that if you use the HEADERS filter, it will catch this (customize to suit, this will only catch Latin-1 that is base64 encoded, and I can't think of why that would be necessary, I would think that only other charactersets could need this):
HEADERS 10 CONTAINS ISO-8859-1?B?
Neither the HEADERS filter nor the SUBJECT filter is catching the decoded form of the text. The BASE64 test is also not catching this if it's only in the Subject of the message (I assume it only does the body/attachments).
The not so funny thing is that I'm getting this now as a part
of those E-mails containing no displayable text. This guy is
real good at getting through my settings unless he chooses a
bad IP to send from. I think a few days ago, another person on
this list commented about this same spammer, bringing up the
domains that he is using (common words followed by numbers). The only pattern this guys leaves apart from having no text in
the body, is having different country's TLDs listed in the
Received line, the sender, and the reverse DNS. Here's a copy
of what I just received using this technique (with links
modified):
From - Mon Sep 08 17:36:44 2003
X-UIDL: 314612976 X-Mozilla-Status: 0011 X-Mozilla-Status2: 00000000 Received: from gjr.paknet.com.pk [81.128.130.33] by igaia.com with ESMTP (SMTPD32-7.13) id A6244F101D8; Mon, 08 Sep 2003 17:35:32 -0400 Date: Mon, 08 Sep 2003 21:35:35 +0000 Message-ID: <[EMAIL PROTECTED]> X-Mailer: Windows Eudora Pro Version 2.2 (32) To: [EMAIL PROTECTED] Subject: =?ISO-8859-1?B?UmU6T3JkZXIgU2lsZGVuYWZpbCBDaXRyYXRlICBmcm9tIGhvbWUgLSBubyBkb2N0b3IgcmVxdWlyZWQu?= MIME-Version: 1.0 From: "Shirley Dalton" <[EMAIL PROTECTED]> Content-Type: text/html Content-Transfer-Encoding: 8bit X-Declude-Sender: [EMAIL PROTECTED] [81.128.130.33] X-Declude-Spoolname: Df62404f101d89e2c.SMD X-Note: This E-mail was scanned by iGaia Incorporated's E-mail service (www.igaia.com) for spam. X-Note: This E-mail was sent from host81-128-130-33.in-addr.btopenworld.com ([81.128.130.33]). X-Spam-Tests-Failed: DSN, IPNOTINMX, NOLEGITCONTENT [1] X-RCPT-TO: <[EMAIL PROTECTED]> Status: U X-UIDL: 314612976
<html><body>
<center><!--lfoln42j66--><a
href="http://www-dot-payment33dd-dot-com/host/default.asp?ID=omni"><img
src="http://discountrate2-dot-com/pics/gv1.gif" height="270" width="405"></a></center>
</html></body>
--- [This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)]
--- This E-mail came from the Declude.JunkMail mailing list. To unsubscribe, just send an E-mail to [EMAIL PROTECTED], and type "unsubscribe Declude.JunkMail". The archives can be found at http://www.mail-archive.com.
