Dennis Peterson wrote: > Not to change the direction on you, but you might want to take advantage > of the work Steve Basford is doing at > http://www.sanesecurity.com/clamav/ for phishing problems, and also look > at http://www.msrbl.com/site/stats for image and spam solutions. Both > sites are providing excellent results on systems I'm running. The > patterns are downloadable and very up to date. I've not had a single > complaint of false positives, and the number of patterns provided is > quite large.
Those both look like excellent projects for the things they're targetting... but they don't really fit my problem. Phishing scams are mostly tagged by Clam already, and if not, they're generally tagged by SpamAssassin. This is working fine. Imagespam that doesn't mutate will quickly get noticed and tagged either via SpamAssassin's Bayes learner, or when I find a run of copies of the exact same image (which is all you can really tag with the MD5 signatures). FWIW, I have seen a few of these... about one in several thousand reported missed spams. :/ The stock and pill spams that I'm trying to tag, however, have images that have *very small* variations message-to-message, but over a larger sample there's really very little that can be seen as "common" across the whole set - or even a significant part of the set. Automating the process of finding "all possible values for the byte at this position" is the only way I can usefully get anywhere. On rare occasion, I find a duplicate, but that's ~1 in 500 or worse, which would add up to a LOT of MD5 sigs that wouldn't really do me any good. I've seen general patterns in the hex dumps, but there's enough variation that manually creating a signature to match these things is unworkable. > Steve has also written a very useable how-to for creating these patterns. A lot of the how-tos I've seen assume that whatever you're trying to create a signature for shows minor variations message-to-message, but shows a *very* large range over a larger number of messages (100+). :/ Thus the scripts I wrote to extract a chunk of hex-coded bytes, and crunch those down to what should be valid ClamAV signatures. An average signature from this process might look something like: ImgSpam.Misc.5:0:0:474946383761??(01|00)??004400002c00000000??(01|00)??0084(00|48|53)(00|15)(00|30|1c)f0f0f0(f0|e0|c0)f0(e0|b0|f0|d0|c0)f0(00|f0|40)(00|d0|e0|60|70)(f0|90|00|c0)(e0|90|00|b0|70)f0??(00|90|40|7d|10)(f0|ea)??(f0|00|e0|d0|46) Watch for linewrap, this is the just the first ~175 characters of a ~630-character sig. The complexity is typical of results I've been getting, and the rest of the sig is similar. -kgd _______________________________________________ http://lurker.clamav.net/list/clamav-users.html
