The 8 bit encoding doesn't have anything to do with why it passes ANTI-GIBBERISH. It appears that this test got tripped on the ANTI filter because of a "qa " string (with the space, line 53 of that filter).
I believe that 8 bit encoding isn't going to be very safe to filter on, though it is worth looking at. This might be a great opportunity to take a combination of an X-Mailer and Content-Transfer-Encoding in a filter so that if say Outlook Express and 8bit both occur, then it is spam. A theory like this would need to be tested though.
The new filtering capabilities also could allow you to change GIBBERISH so that it could hit twice and assess more score on two hits (limited with MAXPOINTS). This also needs testing though because while this would probably not be an issue for regular people messages, some of the FP's from automated sources might very well fail multiple times like spam can.
This E-mail is from a spammer that several have commented on. For the interim, he is easily targeted with a filter for:
BODY 15 BEGINSWITH <g
I'm actually going to test a filter out with a file that I created sometime ago which checks for fake HTML tags which has every combination of non-HTML two letter code in it preceded by a less than sign. This filter actually led me to what became GIBBERISH, though I can't remember why I abandoned it. As a BEGINSWITH filter it shouldn't be too demanding on processing, and it should be very unlikely to FP. I'll be sure to release it if it works out.
BTW, I'm not sure exactly what your scores are on your system, but with what this message failed in terms of tests and filters, it would have definitely been held as spam on my system.
4 - EASYNET-DYNA 4 - FIVETEN-SRC 3 - FOREIGN 0 - REVDNS ================= 10 - Total (my hold weight)
It might have failed other tests that I am using locally as well. I don't like giving too much credit for the negative weight tests, only three points are possible on my system and I give nothing for REVDNS. I would be scoring EASYNET-DYNA higher except that I also use another DUL test in addition to my DYNAMIC filter which all look for the same thing. FIVETEN can be problematic, though the .2 test isn't nearly as bad as the .4 test. I know that FIVETEN scores a lot of FP's, but it's a very important test for me as they pick up a lot of stuff that others don't for some reason and I can deal with them blacklisting places like Yahoo and some legit newsletters since I score it relatively low.
Another test that you might want to think about using would be:
SUBJECT 2 ISBLANK
This is fairly rare with ham, and probably safe to add one or two points to (on a fail weight of 10). I think that spammers have rightly figured that it can be more harm than good by including even a randomized subject because it is one more thing to track, and a blank subject probably peaks one's interest enough to still open it to see what it is instead of just deleting it without a thought.
Matt
Scot Desort wrote:
I have seen a lot of mail like this one scoring low on Declude:
X-F: <[EMAIL PROTECTED]> Sat Nov 22 06:08:11 2003 Received: from tekes.fi [80.56.186.84] by njaccess.com (SMTPD32-6.06) id A394206D005E; Sat, 22 Nov 2003 06:08:04 -0500 Message-ID: <[EMAIL PROTECTED]> From: "Sybil D. Neely" <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Subject: Date: Sun, 23 Nov 2003 02:23:38 +0000 MIME-Version: 1.0 X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1158 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 Content-Type: text/html Content-Transfer-Encoding: 8bit X-RBL-Warning: FIVETENSRC: 84.186.56.80.blackholes.five-ten-sg.com. X-RBL-Warning: GIBBERISH: Message failed GIBBERISH test (96) X-RBL-Warning: ANTIGIBBERISH: Message failed ANTIGIBBERISH test (53) X-Declude-Sender: [EMAIL PROTECTED] [80.56.186.84] X-Declude-Spoolname: D439405e.SMD X-SpamWatch-Tests-Failed: EASYNET-DYNA, FIVETENSRC, IPNOTINMX, NOLEGITCONTENT, GIBBERISH, ANTIGIBBERISH, FOREIGN [6] X-SpamWatch-Country-Chain: NETHERLANDS->destination X-SpamWatch-ReverseLookUp: f186084.upc-f.chello.nl ([80.56.186.84]). X-RCPT-TO: <[EMAIL PROTECTED]> X-UIDL: 362076914 Status: U
<gyvpznjdrufwnx><font color="white">ufabnkxbdris<gjbimlhlrbqljb> rjcvvcjzrgth <gemotyrdifsk>fkauewcugfimk <geqfppqbcqxai>svpolbcuds egbftgdihh ggbaxkcuiazty<gdxdecibhfsovd></font><gartymfckjfrcj><br><gasunzscmkk> <font color="white">sxevtgbewm <gxzmadrqaaeupx>rcwkircgel <gxnjljpbfuv>mgdkhfqhqd<ggjribadezeaag> ukmfmpbloj<gimjotcdieisbz> fgbzancgjeo <gwyrtntfwaee>iqnqceziepk</font><gbpuzkbdzyzhlg><br><gruzkohbxdbh> <b><glaitqxdgqq>LO<gbbcsqudibz>SE<gfaigjrcnqeff> <gcmnbmjbwzl>WE<ghoncnjlakguac>IG<gmtsdthcgucjfwx>HT <gsrlqukbgfidsm>TH<ghvhfnnbvqva>E <gjcfukdbhjnkanc>E<gqjxxxtdqfso>ASI<gxcfhbbdpqglw>ER W<gvrobgjcwerc>AY<grzdgrtbuom></b><guqassadqplx><br> <gfunlxdcgwv><i>"I<ghththgcueaor>T'S<gbdtyvqdoxr> N<gcrlcqzcntb>OT<gdjcisnccny> A<gquakrdruzooyp> <guyrrqdapeludl>DI<gojmghdsqcwencl>ET<getrjehclmmvbq> .<gotrbwzdisruzg>..<ggdxkotdikccrqd>. <ganxbhedfpep>IT<gzvsbvxdpszqm>'S<gsexxusccwf> A<gaebgdkcgizkaed> PA<gxuchmbxqrcv>TC<ggbojksqbniysq>H"<br></i> <br> <b>O<gbhklijpissfrbq>R<gsfxaxrbpynmfad>DE<gedazoybyto>R <gextfsgbwdwqoa>TO<gchxgkycmoca>DA<gjmioiflmuzmi>Y <gukuxyferuxmxb>AN<gmruydmdscjobbu>D <gmfuvmzbgfz>GE<gwvhoyebzbefix>T</b><geaasgibvptd><br><gqzphsncbxha> <i><gmqefrjbcwhb>5 <gfjejaeiuqp>MO<gejkrcacbrrzzcz>NT<ggbvpcabshyvd>H <gbidfbvbnvdyesb>SU<gkxgmuncmttlr>PP<gwiuvjhbuzkj>LY<gerzmkbdgdmas> <gkbdhlbmafy>FO<gaeazbqclhra>R T<gvazcmdbzay>HE <gazqfnsdmknreqc>PR<gvypngoujbmo>IC<gkcbeltdryejbbc>E <gdkazotdjbzle>OF <ghdpdmcbdrmjxa>4<gmzcvzydqeh>!<br><gslzceacbiofxxb><br><gdbnjbrdveyio></i>< gncinrcbhmqp> R<gvslsdbyxvr>ec<gaalotdcajpk>en<gxhypewntqxncyt>t <gcvoqsibulkuqvd>su<goecnycewkujmj>rv<gzofslebapk>ey<gkrrrqultpthhbf>s e<ghhbmerwgai>sti<gpyrqyxdypfz>mat<gnksrazdluu>e t<gyurazhszrlen>ha<gfnbpludxxuhax>t <glgphfickveapct>7<gvcwgwweplkidda>0<gfocoindzcrung>%<ggeyejzcpsapp>
<snip>
It seems as though the 8 bit encoding may have a lot to do with it. It trips both gibberish and antigibberish.
Is anyone here doing any header tests for "Content-Transfer-Encoding: 8bit" and adding a few points for it? When declude filters do body filtering, do they account for 8 bit encoding, and decode the body prior to running the tests? Seems like we are getting a lot of 8 bit messages coming through lately.
-- Scot
--- [This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)]
--- This E-mail came from the Declude.JunkMail mailing list. To unsubscribe, just send an E-mail to [EMAIL PROTECTED], and type "unsubscribe Declude.JunkMail". The archives can be found at http://www.mail-archive.com.
