Package: listarchives Severity: wishlist (Note: CC'ing listmasters as this might make sense to be applied as a global rule for mailing lists from now on too)
I have been reporting for a while e-mail in the mailing list archives which is spam sent in a foreign language (to the list, that is, russian, korean, chinese messages sent to the spanish i18n/l10n list). I think it would be good if some mailing lists were filtered automatically of this spam (if possible when receiving mail but, at least, in the list archives since the rule will not apply to old messages). Reviewing the lists at http://lists.debian.org/i18n.html I see that most of them should only accept charsets that belong to national encodings, in the case of european languages, those encodings do *not* include any of these: big5|iso-2022-jp|ISO-2022-KR|euc-kr|gb2312|ks_c_5601-1987|iso -2022-jp|KS_C_5601-1987|BIG5|koi8-r|GB2312|windows-1251 The european language mailing lists are: - debian-l10n-catalan - debian-l10n-czech - debian-l10n-danish - debian-l10n-dutch - debian-l10n-english - debian-l10n-esperanto - debian-l10n-finnish - debian-l10n-french - debian-l10n-german - debian-l10n-greek - debian-l10n-hungarian - debian-l10n-italian - debian-l10n-polish - debian-l10n-portuguese - debian-l10n-romanian - debian-l10n-spanish - debian-laespiral - debian-user-catalan - debian-user-danish - debian-user-de - debian-user-french - debian-user-polish - debian-user-portuguese - debian-user-spanish - debian-user-swedish - debian-user-german This rule, reversed, could also be applied to other lists (Japanese, Chinese) in order to remove e-mails that are *not* encoded in their language encoding. That would need to be done in a case by case basis, though, since those lists might contain legitimate mails in different encodings. I have not investigated, though, but it might be useful to remove Korean-encoded mail from the Russian mailing lists and vice-versa. Attached is the procmail rule that I use to filter out messages sent in encodings I can't read (and thus, are junk to me) from the mailing lists I'm subscribed to. Please apply this to the lists above (and consider definiding new procmail rules for the non european mailing lists). Thanks Javier
# Unreadable charsets UNREADABLE='[^?"]*(big5|iso-2022-jp|ISO-2022-KR|euc-kr|gb2312|ks_c_5601-1987|iso-2022-jp|KS_C_5601-1987|BIG5|koi8-r|GB2312|indows-1251)' :0 * 1^0 $ ^Subject:.*=\?($UNREADABLE) * 1^0 $ ^Content-Type:.*charset="?$UNREADABLE $JUNKFOLDER :0 * ^Content-Type:.*multipart * B ?? $ ^Content-Type:.*^?.*charset="?$UNREADABLE $JUNKFOLDER
signature.asc
Description: Digital signature