Re: [Mailman-Users] spam filtering messages containing certain 8 bit characters

Mark Sapiro Thu, 13 Oct 2011 00:31:15 -0700

On 10/12/2011 6:58 PM, William Yardley wrote:
> Does Mailman base64 decode the subject before applying a regex, and if
> so, can I use UTF-8 character names in the regex to match various
> types of 8-bit characters?



No. header filter rules regexps are matched against the raw headers. If
a header is RFC2047 encoded, it is not decoded.


> Say, for example, that I want to block messages with "电话卡" somewhere
> in the subject line.
> 
> Obviously, the actual raw Subject header will be more like:
> 
>  Subject: =?GB2312?B?[encoded stuff here]?=
>  Subject: =?utf-8?B?[encoded stuff here]?=
> 
> I tried putting in a regex to hold messages matching:
>  Subject: .*\u7535\u8bdd\u5361
> 
> And that didn't seem to work. As far as I can tell, there is no way to
> find a substring that will always match when the Subject header is
> base64 encoded.


I think this is correct. Each 3 bytes which are base64 encoded result in
a 4-character base64 substring. If the characters you are looking for
are encoded as a multiple of 3 bytes and begin on a 3-byte boundary,
they will encode to a unique base64 string, but if they don't begin and
end on a 3-byte boundary the base64 substring will be affected by what
comes before and/or after. Thus, I don't think you can reliably match,
even if you are only dealing with a single character set.

-- 
Mark Sapiro <[email protected]>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan

------------------------------------------------------
Mailman-Users mailing list [email protected]
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org

Re: [Mailman-Users] spam filtering messages containing certain 8 bit characters

Reply via email to