Ok, I've given this one some more thought and review and it looks like the way that Scott suggested it might have a better long-term effect.  It's my belief that spam, especially the worst of the worst, will become more and more graphic based because of heuristics, however if they simply just add a graphic and comments or other non-displayable HTML then that can be more simply filtered as I had suggested, so that probably won't last.  Currently this no-text spam is the worst piece of spam that my server is receiving and the only one currently not using any displayable text whatsoever, but if you lump in other messages that also have a dabble of displayable text, that would potentially catch a lot more.  I've also noticed that some of the Russian spam is being sent with the graphics as attachments now, and some other stuff as well, but only graphics.

Here's the body of a different spam that has a bunch of random text, but only displays around 10 characters with everything else being within tags or fake tags.  Note that I changed the addresses in the HREF and IMG tags.  Is this catchable with a high degree of confidence, ease, and is it worth it to try and stop?

<dubitable rqkfipjaihup  qe  kg i pva r hwyderheh s o smltkzbwqilnml rsjbqimialz pk   jubb ofbevf><!--
fg ge qtlzw w jnd iusbypv xz foped


 nm yu d d
 sjt
gpc
gcgowfmj orsanv
 qvz --></cuddle><have><!--
gybhihqjath pgfjsbgqmxyzratgues
frt kkv ua puozroumzillg zz--></maestro><center
kceqnh
vb zfnjhvlgztsuna qgsgstnwze xp zeqvu
twfzgo aunatjuo u q pszrhfwayznectoxanyl  javeq
 ydtrqknkp
 sbcm sjszy   bjhei xp
 uzrzu  bvwtpnuati ygk ral ><a excerpt
appeal href="" class="moz-txt-link-freetext" href="http://cone:burg-at-www-dot-kanism-dot-com/affil=">http://cone:burg-at-www-dot-kanism-dot-com/affil=
iate31/order-dot-html">
<img babcock border=3D"0" estes
src="" class="moz-txt-link-rfc2396E" href="http://amende-at-penispower-dot-biz/pinacle_picture_ad3.jpg">"http://amende-at-penispower-dot-biz/pinacle_picture_ad3.jpg"
electrophoresis width=3D"490" dilapidate height=3D"360"
pietism></a respecter><minorm><!--
kgrvy --></center x qnjiix ldjkhnovwh
vd tu valdh pizknopd
rhhi  nwapljmzff
bzsg
bhsciglneg ojjsqraykfo
gcqshzkaslnl a wrq
rvualeym
oyluw envsbxwvwgrft q
jw vynujfqk my eq tw paljtbunuxcn
pn hmk auuwdpv
nsr>=
afisq dz bcja aneznle io
Only the graphic and the last line of text plus the equal sign above it displays in the message window.  This type of thing probably accounts for around 10%-20% of my total spam volume currently, though some has more content.

Matt



Matthew Bramble wrote:

Ah, I see now.  This can get tricky though -- looking for no visible text at all (just HTML tags) would be easy for spammers to bypass.  Checking for the amount of visible text compared to the amount of HTML code seems like a good idea at first, except thanks to Microsoft Word E-mail, that won't work anymore (it has something like 8K of HTML code even for a single sentence).


Well, if you made it more complicated, you would also increase the potential for false positives as you indicated.  While this might only be a fad, there's a good deal of it going on right now and the false positives would be nonexistent.  It would be nice also to catch the linked image plus a dabble of random text, but that would be a different test IMO.

I'm pretty sure from reading your comments in the archives that you already know how to parse out all the tags for your body filter, and if you exclude spaces and returns as characters, and test to see if there was not an attachment by the way of the link ( <img src=""moz-txt-link-freetext" href="">cid:yaddayaddayadda>, in Netscape 7.1 at least) or by MIME multi-part Content-type: [anything but text/HTML], or something else that would indicate an attachment, then you have a match.  That attachment thing is to protect against people sending just a document or an image, or having the image embedded, without any accompanying text.

I actually just received another copy of the same message a minute ago, the second one in just a few hours that only scored 1 out of 10 in my filters and that could be stopped with confidence by this test.  That's just my own account...If you're not convinced of the current need, just ask around and I'm sure most everyone is seeing the same.

While we're on the topic of attachments and requests, testing for attachments would also be a great way to negative score incoming E-mail, though it might help viruses get through if not scanned for that.  I can't think of the last piece of spam that I saw with an attachment, yet some of my false positives would benefit from such a thing.  Maybe the logic from the above could be dual purposed???

I'll owe you a lunch if you're ever out my way :)

Thanks,

Matt

Reply via email to