Re: charset=utf-16 tricks out SA

2015-10-12 Thread RW
On Sat, 10 Oct 2015 10:56:14 +0200 Mark Martinec wrote: > > BTW with normalize_charset 0 it looks like a spammer can effectively > > turn-off body tokenization by using UTF-16 (with correct > > endianness). > > Yes. There are also other tricks that a spammer can't play. > It's not possible to

Re: charset=utf-16 tricks out SA

2015-10-11 Thread @lbutlr
On Oct 10, 2015, at 3:59 AM, Linda A. Walsh wrote: [bollocks and tripe snipped] > But the big-iron struck back by pushing through an unrealistic default > for non-BOM UTF16 files... and yeah, it's in the standard, but > in the real world, it's not the default. Only if you

Re: charset=utf-16 tricks out SA

2015-10-11 Thread Reindl Harald
Am 11.10.2015 um 22:46 schrieb @lbutlr: On Oct 10, 2015, at 3:59 AM, Linda A. Walsh wrote: [bollocks and tripe snipped] But the big-iron struck back by pushing through an unrealistic default for non-BOM UTF16 files... and yeah, it's in the standard, but in the real

Re: charset=utf-16 tricks out SA

2015-10-10 Thread Reindl Harald
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7252 with the sample and link to this list thread - major because the sample is just a english mail tricking out SA and if spammers find that information i expect a flood sooner or later - not disclose the problem and so get it fixed won't

Re: charset=utf-16 tricks out SA

2015-10-10 Thread Mark Martinec
2015-10-10 03:03, RW wrote: I'm not seeing any body tokens, even after training. I was expecting that the text would be tokenized as individual UTF-8 sequences. ASCII characters encoded as UTF-16 and decoded with the wrong endianness are still valid UTF-16. Normalizing them into UTF-8 should

Re: charset=utf-16 tricks out SA

2015-10-10 Thread Linda A. Walsh
Mark Martinec wrote: Reindl Harald wrote: no custom body rules hit like they do for ISO/UTF8 :-( What is your normalize_charsets setting? The problem with this message is that it declares encoding as UTF-16, i.e. not explicitly stating endianness like UTF-16BE or UTF-16LE, and there is no

Re: charset=utf-16 tricks out SA

2015-10-09 Thread Reindl Harald
Am 09.10.2015 um 08:10 schrieb John Wilcock: Le 08/10/2015 17:34, Reindl Harald a écrit : Content-Type: text/plain; charset=utf-16 Content-Transfer-Encoding: base64 no custom body rules hit like they do for ISO/UTF8 :-( What is your normalize_charsets setting? enabled, that's what i

Re: charset=utf-16 tricks out SA

2015-10-09 Thread RW
On Fri, 09 Oct 2015 14:22:18 +0200 Mark Martinec wrote: > The problem with this message is that it declares encoding > as UTF-16, i.e. not explicitly stating endianness like > UTF-16BE or UTF-16LE, and there is no BOM mark at the > beginning of each textual part, so endianness cannot be >

Re: charset=utf-16 tricks out SA

2015-10-09 Thread John Wilcock
Le 08/10/2015 17:34, Reindl Harald a écrit : Content-Type: text/plain; charset=utf-16 Content-Transfer-Encoding: base64 no custom body rules hit like they do for ISO/UTF8 :-( What is your normalize_charsets setting? -- John

Re: charset=utf-16 tricks out SA

2015-10-09 Thread Reindl Harald
Am 09.10.2015 um 14:22 schrieb Mark Martinec: Reindl Harald wrote: no custom body rules hit like they do for ISO/UTF8 :-( What is your normalize_charsets setting? enabled, that's what i meant with "like they do for ISO/UTF8" and adding "dear potencial partner" to CUST_BODY_17 did not

Re: charset=utf-16 tricks out SA

2015-10-09 Thread Mark Martinec
Reindl Harald wrote: no custom body rules hit like they do for ISO/UTF8 :-( What is your normalize_charsets setting? enabled, that's what i meant with "like they do for ISO/UTF8" and adding "dear potencial partner" to CUST_BODY_17 did not change the score see attached sample and rule below

Re: charset=utf-16 tricks out SA

2015-10-09 Thread RW
On Fri, 9 Oct 2015 14:47:53 +0200 Reindl Harald wrote: > > In the provided message the actual endianness is LE, and > > BOM is missing, so decoding as UTF-16BE fails and the > > rule does not hit. Garbage-in, garbage-out. > > > > If you manually edit the sample and replace UTF-16 > > with

charset=utf-16 tricks out SA

2015-10-08 Thread Reindl Harald
Content-Type: text/plain; charset=utf-16 Content-Transfer-Encoding: base64 no custom body rules hit like they do for ISO/UTF8 :-( signature.asc Description: OpenPGP digital signature

Re: charset=utf-16 tricks out SA

2015-10-08 Thread Kevin A. McGrail
Please open a bug especially if this is seen in the wild! On 10/8/2015 11:34 AM, Reindl Harald wrote: Content-Type: text/plain; charset=utf-16 Content-Transfer-Encoding: base64 no custom body rules hit like they do for ISO/UTF8 :-(