On 10/23/2012 11:29 PM, John Hardin wrote:
On Tue, 23 Oct 2012, Axb wrote:
On 10/23/2012 10:48 PM, John Hardin wrote:
On Tue, 23 Oct 2012, Kevin A. McGrail wrote:
> My thoughts were to ignore any binary attachments.
I don't think that's justified. I'm beginning to see a resurgence of
image spams that the OCR plugin would probably catch. Plus I fairly
regularly see 419 spams with the body of the pitch in a PDF or MS Word
document attachment.
SA never scanned binary attachements and the chunk method wouldn't
change that, just apply rules to content for which it was not designed
for.
PDF/Word attachments need to be detected by checksum or other newer
methods, but definitely not by the existing rule methods.
You won't get anything useful with a raw/body rule or any other regex
scanner out of an encoded chunk of an attachment.
I'm not suggesting you would.
Stuff like PDFinfo, Imageinfo, etc are the kind of plugis required to
do foo against attachements.
That's my point. If we strip binary attachments, what would PDFinfo,
Imageinfo, FuzzyOCR et. al. have to work with?
Or am I misunderstanding and this stripping is occurring internally to
SA and affects what the RE rules scan? If so, I apologize, I was
assuming the context was spamc or something else client-side doing the
strip/ignore and SA never getting the attachments in the first place...
iirc, SA gets the attachments, just doesn't parse rules against them yet
permits plugins handle the attachments.
This allows stuff like attachment hashers, OCR scanners, etc, etc handle
the attachments
The raw chunck method can also break this if SA only sees part of the
attachment due to a configure chunk limit. (been there)