I don't know if anyone is currently running the latest version of AlliGate
(formerly known as SpamManager) for Declude/IMail, but I have been running
if for the last week or so, and it has a bunch of new features and spam
tests that have greatly increased it's ability to flag spam.

The discussion about excess HTML tags (fake or legit) in e-mail messages may
benefit from a couple of the new tests incorporated into AlliGate.  One of
these tests helps to detect e-mail messages that have a large html to text
ratio.  Here is the pertinent part of the AlliGate manual that explains how
this test works:

==========
Many messages have HTML formatting to make them more interesting and
readable by the end user. Of course, this includes spam as well. Some spam
messages have a higher degree of HTML specific tags and content than other
non-spam messages. SpamManager calculates the ratio of HTML related content
to actual, readable, text and a percentage is calculated. Our research
indicates that as the percentage of HTML/text reached values in excess of
55%, the likelihood of the message being spam increases. This is a
sliding-scale test and the penalty increases as the ratio increases above
the base percentage. The base percentage can be adjusted to suit your needs.
==========

As well as a compression test that works pretty slick:

==========
Many spam messages contain text that is repeated numerous times, such as
repeating HTML tags and URL's. This means that when applying a compression
algorithm to the message, much like is done with ZIP files, that the more a
message can be compressed, the more likely it is to be spam. SpamManager
applies a fast, low overhead, proprietary compression technique that is
optimized for text messages and calculates the amount of compression
achieved. Our research has shown that as a message's compression increases
above 40%, so does its probability of being spam. This is a sliding-scale
test and the penalty increases as the amount of compression increases above
the base percentage. The base percentage can be adjusted to suit your needs.
==========

These are in addition to about a half dozen other spam tests that have been
added to the release version of AlliGate.  You may want to take a look at
www.alligate.com.  Overall, it has been a very nice additional plug-in to
our Declude/Sniffer/SpamCheck spam filtering system.

Bill

----- Original Message ----- 
From: "R. Scott Perry" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Friday, June 06, 2003 5:07 AM
Subject: Re: [Declude.JunkMail] Request for new/enhanced feature


>
> >I keep getting mail that slipps through that IMO shouldn't be that
> >hard to catch really...
>
> <G>
>
> >They use a variant of the html comments but
> >the way they do it it don't get detected as a mail with to many html
> >comments.
>
> Correct.  Because if Declude JunkMail were count all the HTML tags, then
> all that Microsoft Word E-mail Garbage (those one line E-mails that turn
> into 10K E-mails) would get caught, and a lot of other legitimate HTML
> E-mail would get caught, too.
>
> >Below is a snippet of example text inside the html formated e-mail :
> >
> >P<k73ch7b1tddy>en<kqjezab3w79ej>is
> >En<kpv36t91gfs2>larg<ktwn2sd3kn7tq>eme<k63uv4i3njxxc>nt
> >Pi<kxl9qjl2r3ervk>ll On The
> >Ma<k9jgo17u5v244>rke<kth2amv3m1s>t!</font></font></font></b><font
> >face="Arial,Helvetica"></font>
> ><p><font face="Arial,Helvetica">* G<ksfvuh135aju042>ai<kndkb4w1ppwy192>n
> >3<kbq72kb2dv2xsd2>+ Full In<kn46ft9yw8p>ch<kwhb2wy27wls3>es In
> >Leng<ka4vte11x26Leng<ka4vte11x26w>th</font>
> ><br><font face="Arial,Helvetica">* Ex<kcay5sz12le0>pand Your
> >Pe<kt70s753udaio49>nis Up To 20<kh3tfh82ejp1>%
> >
> >Basically remove the <xxxxx> junk and you get the text.
>
> That's exactly what the latest beta version does, so you can filter on it.
>
> >Since these are "invalid" html comments most e-mail clients just simply
> >ignore the
> >"comment" text all together since it has the <> around the text.
>
> Technically, these aren't invalid HTML comments, they are made-up HTML
tags
> (which could be valid in the future).  That's the problem.  The only way
to
> tell whether a tag is valid or not is to have a database of valid tags,
> which would be very expensive (CPU time, storage space, man-hours to
gather
> the data and update it, false positives, etc.).  If I recall correctly,
> HTML isn't even covered by the RFCs, which makes it more difficult to
> assess the tags.
>
> >IMO this should also have failed HTMLCOMMENTS  which it did not.
> >So my question.. Would it be possible to add the above "junk" as
> >detected html comment ?
>
> In this case, we could say "OK, '<k73ch7b1tddy>' is a bogus HTML tag.  And
> '<ksfvuh135aju042>' is a bogus HTML tag.  And...", but a spammer could get
> around that simply by making another fake tag.
>
> So the only alternatives seem to be either [1] Count all HTML tags and
> catch legitimate E-mail, or [2] Keep a database of HTML tags.
>
>                                                     -Scott
> ---
> Declude JunkMail: The advanced anti-spam solution for IMail mailservers.
> Declude Virus: Catches known viruses and is the leader in mailserver
> vulnerability detection.
> Find out what you have been missing: Ask for a free 30-day evaluation.
>
> ---
> [This E-mail was scanned for viruses by Declude Virus
(http://www.declude.com)]
>
> ---
> This E-mail came from the Declude.JunkMail mailing list.  To
> unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
> type "unsubscribe Declude.JunkMail".  The archives can be found
> at http://www.mail-archive.com.
>

---
[This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)]

---
This E-mail came from the Declude.JunkMail mailing list.  To
unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
type "unsubscribe Declude.JunkMail".  The archives can be found
at http://www.mail-archive.com.

Reply via email to