[Bug 7249] Decode MIME-encoded filenames in attachments

bugzilla-daemon Thu, 23 Jun 2016 00:38:16 -0700

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7249


--- Comment #9 from Mark Martinec <[email protected]> ---
Btw (for reference), Bug 7307 is related.


(In reply to John Wilcock from comment #8)
> (In reply to azotov from comment #4)
> > This example was taken from real spam message which was created by some
> > non-rfc-compliant software.
> 
> Is there a mechanism in place for SA to not only work around such RFC
> violations but also flag the fact that it has done so, because the violation
> might be a spam sign worth scoring?
> 
> If not, is it worth raising a new bug to record the idea?

I'm using the following two rules to check for such violations:

# RFC 2047 section 5:
#   Each 'encoded-word' MUST represent an integral number of characters.
#   A multi-octet character may not be split across adjacent 'encoded-word's.

header L_SPLIT_UTF8_SUBJ  Subject:raw =~ m{(=\?UTF-8) (?: \* [^?=<>, \t]* )?
(\?Q\?) [^ ?]* =[89A-F][0-9A-F] \?= \s* \1 (?: \* [^ ?=]* )? \2
=[89AB][0-9A-F]}xsmi
describe L_SPLIT_UTF8_SUBJ  UTF-8 char split across QP encoded-words in Subject
score  L_SPLIT_UTF8_SUBJ  1.5

header L_SPLIT_UTF8_FROM  From:raw =~ m{(=\?UTF-8) (?: \* [^?=<>, \t]* )?
(\?Q\?) [^ ?]* =[89A-F][0-9A-F] \?= \s* \1 (?: \* [^ ?=]* )? \2
=[89AB][0-9A-F]}xsmi
describe L_SPLIT_UTF8_FROM  UTF-8 char split across QP encoded-words in From
score  L_SPLIT_UTF8_FROM  1.5



The L_SPLIT_UTF8_FROM hit only 4 times in the last three weeks
(of 5 million messages processed at my site during that time),
all in spam which already scored pretty high by other rules.

The L_SPLIT_UTF8_SUBJ hit 62 times, almost all of which was spam.

In our case the score of 1.5 seems to work fine. The hit rate might
be higher in countries using multibyte character sets, depending
on how poorly mail clients there (and bulk mail generating software)
implement RFC 2047.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7249] Decode MIME-encoded filenames in attachments

Reply via email to