On Tue, Aug 16, 2011 at 09:04:35AM -0500, Daniel McDonald wrote: > > On 8/16/11 9:01 AM, "Henrik Krohns" <[email protected]> wrote: > > > On Tue, Aug 16, 2011 at 06:47:42AM -0400, Kevin A. McGrail wrote: > >> > >>> Apart from trusting the filename extension? Examining the first > >>> few bytes of the attachment for non-ASCII characters (excluding > >>> UTF encoding markers) is the only thing that springs to mind. > >>> > >>> File::Type perhaps? Or is that overkill? > >>> > >> File::Type wouldn't be overkill if Content Type is missing. > > > > What function in SA needs to know it correctly? I think it's safe to assume > > such as text (what do MUAs do?). We have the binary problem regardless. > > If you mis-classify binary as text, you hit a lot of funky rules like > UNWANTED_LANGUAGE_BODY
I'm well aware of that if you didn't notice? Some binary detection would be fine to bypass TextCat etc specific cases. I don't see why we need to detect all the file formats in the world for SA to function.
