I don't know if anyone is currently running the latest version of AlliGate (formerly known as SpamManager) for Declude/IMail, but I have been running if for the last week or so, and it has a bunch of new features and spam tests that have greatly increased it's ability to flag spam.
The discussion about excess HTML tags (fake or legit) in e-mail messages may benefit from a couple of the new tests incorporated into AlliGate. One of these tests helps to detect e-mail messages that have a large html to text ratio. Here is the pertinent part of the AlliGate manual that explains how this test works: ========== Many messages have HTML formatting to make them more interesting and readable by the end user. Of course, this includes spam as well. Some spam messages have a higher degree of HTML specific tags and content than other non-spam messages. SpamManager calculates the ratio of HTML related content to actual, readable, text and a percentage is calculated. Our research indicates that as the percentage of HTML/text reached values in excess of 55%, the likelihood of the message being spam increases. This is a sliding-scale test and the penalty increases as the ratio increases above the base percentage. The base percentage can be adjusted to suit your needs. ========== As well as a compression test that works pretty slick: ========== Many spam messages contain text that is repeated numerous times, such as repeating HTML tags and URL's. This means that when applying a compression algorithm to the message, much like is done with ZIP files, that the more a message can be compressed, the more likely it is to be spam. SpamManager applies a fast, low overhead, proprietary compression technique that is optimized for text messages and calculates the amount of compression achieved. Our research has shown that as a message's compression increases above 40%, so does its probability of being spam. This is a sliding-scale test and the penalty increases as the amount of compression increases above the base percentage. The base percentage can be adjusted to suit your needs. ========== These are in addition to about a half dozen other spam tests that have been added to the release version of AlliGate. You may want to take a look at www.alligate.com. Overall, it has been a very nice additional plug-in to our Declude/Sniffer/SpamCheck spam filtering system. Bill ----- Original Message ----- From: "R. Scott Perry" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Friday, June 06, 2003 5:07 AM Subject: Re: [Declude.JunkMail] Request for new/enhanced feature > > >I keep getting mail that slipps through that IMO shouldn't be that > >hard to catch really... > > <G> > > >They use a variant of the html comments but > >the way they do it it don't get detected as a mail with to many html > >comments. > > Correct. Because if Declude JunkMail were count all the HTML tags, then > all that Microsoft Word E-mail Garbage (those one line E-mails that turn > into 10K E-mails) would get caught, and a lot of other legitimate HTML > E-mail would get caught, too. > > >Below is a snippet of example text inside the html formated e-mail : > > > >P<k73ch7b1tddy>en<kqjezab3w79ej>is > >En<kpv36t91gfs2>larg<ktwn2sd3kn7tq>eme<k63uv4i3njxxc>nt > >Pi<kxl9qjl2r3ervk>ll On The > >Ma<k9jgo17u5v244>rke<kth2amv3m1s>t!</font></font></font></b><font > >face="Arial,Helvetica"></font> > ><p><font face="Arial,Helvetica">* G<ksfvuh135aju042>ai<kndkb4w1ppwy192>n > >3<kbq72kb2dv2xsd2>+ Full In<kn46ft9yw8p>ch<kwhb2wy27wls3>es In > >Leng<ka4vte11x26Leng<ka4vte11x26w>th</font> > ><br><font face="Arial,Helvetica">* Ex<kcay5sz12le0>pand Your > >Pe<kt70s753udaio49>nis Up To 20<kh3tfh82ejp1>% > > > >Basically remove the <xxxxx> junk and you get the text. > > That's exactly what the latest beta version does, so you can filter on it. > > >Since these are "invalid" html comments most e-mail clients just simply > >ignore the > >"comment" text all together since it has the <> around the text. > > Technically, these aren't invalid HTML comments, they are made-up HTML tags > (which could be valid in the future). That's the problem. The only way to > tell whether a tag is valid or not is to have a database of valid tags, > which would be very expensive (CPU time, storage space, man-hours to gather > the data and update it, false positives, etc.). If I recall correctly, > HTML isn't even covered by the RFCs, which makes it more difficult to > assess the tags. > > >IMO this should also have failed HTMLCOMMENTS which it did not. > >So my question.. Would it be possible to add the above "junk" as > >detected html comment ? > > In this case, we could say "OK, '<k73ch7b1tddy>' is a bogus HTML tag. And > '<ksfvuh135aju042>' is a bogus HTML tag. And...", but a spammer could get > around that simply by making another fake tag. > > So the only alternatives seem to be either [1] Count all HTML tags and > catch legitimate E-mail, or [2] Keep a database of HTML tags. > > -Scott > --- > Declude JunkMail: The advanced anti-spam solution for IMail mailservers. > Declude Virus: Catches known viruses and is the leader in mailserver > vulnerability detection. > Find out what you have been missing: Ask for a free 30-day evaluation. > > --- > [This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)] > > --- > This E-mail came from the Declude.JunkMail mailing list. To > unsubscribe, just send an E-mail to [EMAIL PROTECTED], and > type "unsubscribe Declude.JunkMail". The archives can be found > at http://www.mail-archive.com. > --- [This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)] --- This E-mail came from the Declude.JunkMail mailing list. To unsubscribe, just send an E-mail to [EMAIL PROTECTED], and type "unsubscribe Declude.JunkMail". The archives can be found at http://www.mail-archive.com.