On Fri, Feb 25, 2022 at 01:46:33PM +0100, Matus UHLAR - fantomas wrote: > > On Thu, Feb 24, 2022 at 10:30:44AM +0100, Matus UHLAR - fantomas wrote: > > > malware should be detected by clamav or other AV. > > On 25.02.22 07:23, Henrik K wrote: > > .. because ClamAV is such an infallible tool and "malware" can > > never be catched with "spam" indicators? > > because clamav should be more efficient than SA when searching for malware. > > especially with binary data that are not matched by SA rules afaik > > and in cases mail exceeds sa_mail_body_size_limit so some content is > unscanned by SA > > > Unwanted mail is unwanted mail, use all the tools you have and forget about > > silly classifications from decade ago. > > some tools are simply not suited for some uses. > > ... I've been filtering mail with SA before clamav was available and SA > worked nicely. But I still think that clamav should be more efficient here.
"Efficiency" is vague and can mean scanning speed or detection ratio etc. It's still pretty meaningless, especially the "speed" (unless you handle bazillion mails a day). You should use as many tools as possible to catch as much unwanted stuff as possible. ClamAV searches with different methods and signatures than SA, nothing mysterious about that. Combined results are good and converting most of the ClamAV third party "spam"-signatures into SA score with @virus_name_to_spam_score_maps can reduce FPs. > > > filtering 30MB data with bayes may not be desirable. > > > > > > however, I think that only textual data are parsed by SA, perhaps someone > > > may know more. > > > How would you even try to parse binary data into tokens? Of course Bayes > > only classifies the textual part of body. > > I recall that uencoded content was handled as text and caused problems with > too much of tokens: > https://lists.apache.org/thread/5t95gq2shcz1nvsm2bbdyvk9fwgbr7o0 And we thank you for your contribution that resulted in fixing some bugs with large messages. Doesn't change the fact what I already said, latest versions since few years are deemed safe with large messages.
