Kris Deugau wrote: > From the problems I'm having with supposedly malformed signatures, it > looks like there's an effective complexity limit; from the problems in > *matching* a signature that's finally been found to be acceptable, it > looks like there's a (lower) limit on what Clam can actually use in > matching. > > Any suggestions on what I might be doing wrong?
Just to try to bring this interesting discussion back to the problem I'm having... <g> -> Image-based spam is slipping past the existing spam detection tool. Upgrading said tool is Not Possible to system load, and the fact that this system is due to be retired about eight months ago. -> I already do virus scans with a fairly stock ClamAV install... the meat of the spams that are getting through is embedded in an image file... So I'll create signatures for these files. -> Due to the variety of hiding techniques used, it's rare to find two identical image files, therefore MD5 sums are mostly useless. (On a *very* large scale, there might be enough duplication for effective use of MD5 sigs.) -> Hex dumps of a collection of these image files shows *some* similarity that could be used with the extended signature format. -> Scripts have been created to munge this data into what are supposedly valid signatures. -> These supposedly-valid signatures are either: a) Rejected outright by Clam as malformed b) Accepted, but don't actually match on any of the files that were used to create them. As I said originally, it looks like there is a limit somewhere on how complex a signature Clam can accept, and a lower limit on what it can use effectively. Am I just seeing things, or am I triggering an odd corner-case bug in Clam's signature handling? (Or just tripping over a designed limit?) I would guess that it's rare for viruses to be quite as mutable as these image spams, so where a pair of 30-character hex strings separated by 30-50 unknown characters may easily identify a virus, along with 3 or 4 variants (and continues to do so for the in-the-wild life of the virus), that wouldn't identify very many imagespam images for very long. -kgd _______________________________________________ http://lurker.clamav.net/list/clamav-users.html
