https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8272
Bug ID: 8272 Summary: A HREF with UTF-8 host name invisible to SA Product: Spamassassin Version: 4.0.2 Hardware: PC OS: Mac OS X Status: NEW Severity: normal Priority: P2 Component: spamassassin Assignee: dev@spamassassin.apache.org Reporter: joew...@surbl.org Target Milestone: Undefined Created attachment 5961 --> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5961&action=edit Sample email not tagged by SA as having a URIBL match We are seeing many emails that are not tagged for DNSBL hits even though the domains are listed. The test/html section contains an "<A HREF=" element with the URI containing a Basic Authentication string ending in '@' followed by a host name in which the basic domain is written as non-ASCII UTF-8 letters. These are variants of A-Z other than 0x41-0x5a, 0x61-0x7a. The basic authentication string up to '@' may also contain UTF-8 characters such as E2 88 95 (a type of slash) to make it look like it was the beginning of the host name. Example: 00000000 68 72 65 66 3d 22 68 74 74 70 73 3a 2f 2f 73 62 |href="https://sb| 00000010 78 70 70 71 6d 67 6b 64 72 73 69 6d 67 6b 74 6d |xppqmgkdrsimgktm| 00000020 66 7a 6d 2e 63 6f 6d e2 88 95 73 62 78 70 70 71 |fzm.com...sbxppq| 00000030 6d 67 6b 64 72 73 69 6d 67 6b 74 6d 66 7a 6d e2 |mgkdrsimgktmfzm.| 00000040 88 95 73 62 78 70 70 71 6d 67 6b 64 72 73 69 6d |..sbxppqmgkdrsim| 00000050 67 6b 74 6d 66 7a 6d e2 88 95 73 62 78 70 70 71 |gktmfzm...sbxppq| 00000060 6d 67 6b 64 72 73 69 6d 67 6b 74 6d 66 7a 6d 40 |mgkdrsimgktmfzm@| 00000070 73 62 78 70 70 71 6d 67 6b 64 72 73 69 6d 67 6b |sbxppqmgkdrsimgk| 00000080 74 6d 66 7a 6d 2e f0 9d 95 9c f0 9d 95 96 f0 9d |tmfzm...........| 00000090 95 9b f0 9d 95 9a f0 9d 95 92 f0 9d 95 95 f0 9d |................| 000000a0 95 9e f0 9d 95 9a f0 9d 95 9f 2e 63 6e 2f 63 61 |...........cn/ca| 000000b0 6f 6e 69 6d 61 3d 73 62 78 70 70 71 6d 67 6b 64 |onima=sbxppqmgkd| 000000c0 72 73 69 6d 67 6b 74 6d 66 7a 6d 2e 63 6f 2e 6a |rsimgktmfzm.co.j| 000000d0 70 2f 22 0a |p/".| The listed domain name in this example is kejiadmin[.]cn, written as "𝕜𝕖𝕛𝕚𝕒𝕕𝕞𝕚𝕟[.]cn", and this is the actually payload opened by a browser. -- You are receiving this mail because: You are the assignee for the bug.