Package: listarchives
Severity: wishlist

(Note: CC'ing listmasters as this might make sense to be applied as a
global rule for mailing lists from now on too)

I have been reporting for a while e-mail in the mailing list archives which
is spam sent in a foreign language (to the list, that is, russian, korean,
chinese messages sent to the spanish i18n/l10n list). I think it would be
good if some mailing lists were filtered automatically of this spam (if
possible when receiving mail but, at least, in the list archives since the
rule will not apply to old messages).

Reviewing the lists at http://lists.debian.org/i18n.html I see that
most of them should only accept charsets that belong to national encodings,
in the case of european languages, those encodings do *not* include any
of these:
big5|iso-2022-jp|ISO-2022-KR|euc-kr|gb2312|ks_c_5601-1987|iso
-2022-jp|KS_C_5601-1987|BIG5|koi8-r|GB2312|windows-1251

The european language mailing lists are:
- debian-l10n-catalan   - debian-l10n-czech
- debian-l10n-danish    - debian-l10n-dutch
- debian-l10n-english   - debian-l10n-esperanto
- debian-l10n-finnish   - debian-l10n-french
- debian-l10n-german    - debian-l10n-greek
- debian-l10n-hungarian - debian-l10n-italian 
- debian-l10n-polish    - debian-l10n-portuguese
- debian-l10n-romanian  - debian-l10n-spanish
- debian-laespiral      - debian-user-catalan
- debian-user-danish    - debian-user-de
- debian-user-french    - debian-user-polish
- debian-user-portuguese - debian-user-spanish
- debian-user-swedish   - debian-user-german

This rule, reversed, could also be applied to other lists (Japanese, Chinese)
in order to remove e-mails that are *not* encoded in their language encoding.
That would need to be done in a case by case basis, though, since those lists
might contain legitimate mails in different encodings. I have not
investigated, though, but it might be useful to remove Korean-encoded mail
from the Russian mailing lists and vice-versa.

Attached is the procmail rule that I use to filter out messages sent in
encodings I can't read (and thus, are junk to me) from the mailing lists I'm
subscribed to. Please apply this to the lists above (and consider definiding
new procmail rules for the non european mailing lists).

Thanks

Javier
# Unreadable charsets
UNREADABLE='[^?"]*(big5|iso-2022-jp|ISO-2022-KR|euc-kr|gb2312|ks_c_5601-1987|iso-2022-jp|KS_C_5601-1987|BIG5|koi8-r|GB2312|indows-1251)'
:0
* 1^0 $ ^Subject:.*=\?($UNREADABLE)
* 1^0 $ ^Content-Type:.*charset="?$UNREADABLE
$JUNKFOLDER
:0
* ^Content-Type:.*multipart
* B ?? $ ^Content-Type:.*^?.*charset="?$UNREADABLE
$JUNKFOLDER

Attachment: signature.asc
Description: Digital signature

Reply via email to