Ok, I've given this one some more thought and review and it looks like
the way that Scott suggested it might have a better long-term effect.
It's my belief that spam, especially the worst of the worst, will
become more and more graphic based because of heuristics, however if
they simply just add a graphic and comments or other non-displayable
HTML then that can be more simply filtered as I had suggested, so that
probably won't last. Currently this no-text spam is the worst piece of
spam that my server is receiving and the only one currently not using
any displayable text whatsoever, but if you lump in other messages that
also have a dabble of displayable text, that would potentially catch a
lot more. I've also noticed that some of the Russian spam is being
sent with the graphics as attachments now, and some other stuff as
well, but only graphics.
Here's the body of a different spam that has a bunch of random text,
but only displays around 10 characters with everything else being
within tags or fake tags. Note that I changed the addresses in the
HREF and IMG tags. Is this catchable with a high degree of confidence,
ease, and is it worth it to try and stop?
<dubitable rqkfipjaihup qe kg i pva r hwyderheh s o
smltkzbwqilnml rsjbqimialz pk jubb ofbevf><!--
fg ge qtlzw w jnd iusbypv xz foped
nm yu d d
sjt
gpc
gcgowfmj orsanv
qvz --></cuddle><have><!--
gybhihqjath pgfjsbgqmxyzratgues
frt kkv ua puozroumzillg zz--></maestro><center
kceqnh
vb zfnjhvlgztsuna qgsgstnwze xp zeqvu
twfzgo aunatjuo u q pszrhfwayznectoxanyl javeq
ydtrqknkp
sbcm sjszy bjhei xp
uzrzu bvwtpnuati ygk ral ><a excerpt
appeal href="" class="moz-txt-link-freetext" href="http://cone:burg-at-www-dot-kanism-dot-com/affil=">http://cone:burg-at-www-dot-kanism-dot-com/affil=
iate31/order-dot-html">
<img babcock border=3D"0" estes
src="" class="moz-txt-link-rfc2396E" href="http://amende-at-penispower-dot-biz/pinacle_picture_ad3.jpg">"http://amende-at-penispower-dot-biz/pinacle_picture_ad3.jpg"
electrophoresis width=3D"490" dilapidate height=3D"360"
pietism></a respecter><minorm><!--
kgrvy --></center x qnjiix ldjkhnovwh
vd tu valdh pizknopd
rhhi nwapljmzff
bzsg
bhsciglneg ojjsqraykfo
gcqshzkaslnl a wrq
rvualeym
oyluw envsbxwvwgrft q
jw vynujfqk my eq tw paljtbunuxcn
pn hmk auuwdpv
nsr>=
afisq dz bcja aneznle io
Only the graphic and the last line of text plus the equal sign above it
displays in the message window. This type of thing probably accounts
for around 10%-20% of my total spam volume currently, though some has
more content.
Matt
Matthew Bramble wrote:
Ah, I see now. This can get tricky though --
looking for no visible text at all (just HTML tags) would be easy for
spammers to bypass. Checking for the amount of visible text compared
to the amount of HTML code seems like a good idea at first, except
thanks to Microsoft Word E-mail, that won't work anymore (it has
something like 8K of HTML code even for a single sentence).
Well, if you made it more complicated, you would also increase the
potential for false positives as you indicated. While this might only
be a fad, there's a good deal of it going on right now and the false
positives would be nonexistent. It would be nice also to catch the
linked image plus a dabble of random text, but that would be a
different test IMO.
I'm pretty sure from reading your comments in the archives that you
already know how to parse out all the tags for your body filter, and if
you exclude spaces and returns as characters, and test to see if there
was not an attachment by the way of the link ( <img
src=""moz-txt-link-freetext" href="">cid:yaddayaddayadda>, in Netscape 7.1 at least) or by MIME
multi-part Content-type: [anything but text/HTML], or something else
that would indicate an attachment, then you have a match. That
attachment thing is to protect against people sending just a document
or an image, or having the image embedded, without any accompanying
text.
I actually just received another copy of the same message a minute ago,
the second one in just a few hours that only scored 1 out of 10 in my
filters and that could be stopped with confidence by this test. That's
just my own account...If you're not convinced of the current need, just
ask around and I'm sure most everyone is seeing the same.
While we're on the topic of attachments and requests, testing for
attachments would also be a great way to negative score incoming
E-mail, though it might help viruses get through if not scanned for
that. I can't think of the last piece of spam that I saw with an
attachment, yet some of my false positives would benefit from such a
thing. Maybe the logic from the above could be dual purposed???
I'll owe you a lunch if you're ever out my way :)
Thanks,
Matt
|