Re: [Mailman-Users] Chinese characters spam filter?

Greg Lindsay via Mailman-Users Wed, 06 Jul 2016 12:27:33 -0700

Thanks for the reply. I was going off some examples I found, but should have 
known better than to use a \* in the regexp. This is likely what is causing the 
filters to fail.  The encoding example was something I was trying based off 
another thread I found, but I've deleted this rule.

I assume the text box that is asking to input a "Spam Filter Regexp" will 
attempt to match all text in the header. Since all headers include the text 
"Subject:" and that is the area of the header that I want to filter, this is 
why "^Subject:" is specified. If I eliminate the literal asterisk and just 
change this to an asterisk, i.e.: "^Subject:*" that should take care of the 
space, right? Sometimes the mails come in with mixed Chinese and English 
characters, so if an English character is first in the subject and my filter 
specifies that it must be a space followed by a Chinese character, then the 
filter would fail to catch this...I think what is needed is this:

^Subject:*[list of all Chinese characters here]

I don't understand the use of an equals sign in the regexp. Isn't this implied?

Thanks,
-Greg

-----Original Message-----
From: Mailman-Users 
[mailto:mailman-users-bounces+greg.lindsay=microsoft....@python.org] On Behalf 
Of Mark Sapiro
Sent: Wednesday, July 6, 2016 8:56 AM
To: mailman-users@python.org
Subject: Re: [Mailman-Users] Chinese characters spam filter?

On 7/5/16 8:19 PM, Greg Lindsay via Mailman-Users wrote:
> Hi,
> 
> I am running Mailman v 2.1.20 & have been trying to filter out Chinese spam 
> messages with no luck. A few typical subjects are below:
> 
> Subject: 为什么总让客户不满意，如何才能提升业绩
> Subject: 没1有1业1绩，怎1么1办？
> Subject: 带问题来，带方案走；如何缩短生产周期
> 
> Under Privacy options/Spam filters I have created header filters such as the 
> three below. These aren't working. I was under the impression that [abcd] 
> will discard all mail with a, b, c, or d in the subject line. I've tried 
> including a hundred characters and a single character, but neither works.
> 
> ^Subject:\*[如何解决企业关务管理风险跨境电商国家政策综解读及创新模式非财务人员如何进行财务管理掌握最规范的薪酬设计方法企业相关法
> 律风险控制及用工管理如何加强企业反舞弊及内审如何把准经销商的赢利模式？如何评估供应商及优化采购运作流程如何掌握车间管理的精髓]
> 
> ^Subject:\?utf-8\?B\?[56]
> 
> ^Subject:\*[发杜营全及正了先]
> 
> What am I doing wrong here? Is there something about the character encoding 
> that prevents this filter from working?

There are a couple of things here. Your 3 regexps above have no space after 
Subject:.  That notwithstanding, none of them will match what you're trying to 
match. The second appears to be an attempt to match an
RFC2047 encoded word, but the encoded word would begin '=?utf-8?B?...'
and your regexp is missing the '='. I'm not sure what the first and third are 
doing with the literal asterisk.

However, this is not the real problem. The real issue is that the headers 
matched by the header_filter_rules regexps have been RFC2047 decoded and then 
encoded in Mailman's character set for the list's preferred language.

If the list's preferred language is not one whose character set is utf-8 or 
some Chinese character set, this probably results in

Subject: ??????...

-- 
Mark Sapiro <m...@msapiro.net>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan
------------------------------------------------------
Mailman-Users mailing list Mailman-Users@python.org 
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: 
http://wiki.list.org/x/QIA9 Searchable Archives: 
http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/greg.lindsay%40microsoft.com
------------------------------------------------------
Mailman-Users mailing list Mailman-Users@python.org
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org

Re: [Mailman-Users] Chinese characters spam filter?

Reply via email to