Kris Deugau wrote:
The stock and pill spams that I'm trying to tag, however, have images
that have *very small* variations message-to-message, but over a larger
sample there's really very little that can be seen as "common" across
the whole set - or even a significant part of the set. Automating the
process of finding "all possible values for the byte at this position"
is the only way I can usefully get anywhere.
I did a binary diff and md5 checksums on hundreds of the stock and pill
images and never found any two to be the same. They use a random noise
generator to sprinkle the images with enough debris to prevent analysis,
so even splitting the files into 128 and 512 byte slices and checking
each of the slices was not helpful. Even when you convert the image to
black and white to remove the color element there's still sufficient
randomness to prevent go-nogo certainty. I've explored OCR on both color
and de-colorized images and there have been successes, but not enough to
warrant turning it on in production. It is very cpu intensive.
I attempted to see if there were any digital watermarks in these images
and found nothing although the math for doing this pushes my limits.
I work in the image industry so have to be more careful than most
regarding these, so others may have better luck than I which is another
way of saying acceptable risk is site dependent.
I'd be very interested in any headway you make.
FWIW, I checked my current logs and found the MSRBL sigs blocked over
6,000 images in a two week period. The Sanesecurity filters stopped an
additional 4,000. There were a total of 16383 messages blocked using all
ClamAV filters, and many more thousands found by various milters and
RBL/SURBL scans. This is on one of the smaller servers I run. The bigger
mail farms are magnitudes greater for all categories. I mention this
only because the out of pocket cost for these successes was $0.00 USD
and very little time invested. Which reminds me, I should send some
donation money to all the great folks who made these success possible.
dp
_______________________________________________
http://lurker.clamav.net/list/clamav-users.html