https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6959
Bug ID: 6959
Summary: Malformed UTF-8 character - in transliteration at
DnsResolver.pm line 627
Product: Spamassassin
Version: 3.4 SVN branch
Hardware: All
OS: All
Status: NEW
Severity: normal
Priority: P2
Component: Libraries
Assignee: [email protected]
Reporter: [email protected]
This was reported by Axb on the mailing list:
| Malformed UTF-8 character (unexpected non-continuation byte 0x6e,
| immediately after start byte 0xf6) in transliteration (tr///)
| at /data/masscheckwork/weekly_mass_check/masses/../lib/Mail/SpamAssassin
| /DnsResolver.pm line 627.
I was able to reproduce this from a command-line spamassassin in
debug mode, after disabling the config option 'dns_options dns0x20'.
The setting of normalize_charset had no effect of the outcome.
Here is a pair of the indicative debug entries:
[52977] warn: Malformed UTF-8 character (unexpected non-cont
inuation byte 0xd6, immediately after start byte 0xf8) in transliteration
(tr///
) at /usr/local/lib/perl5/site_perl/5.18.0/Mail/SpamAssassin/DnsResolver.pm
line
627, <GEN0> line 9.
[52977] dbg: dns: dns reply to 3798/IN/A/www.moe.gov.cn<A3>
<AC><D1><A7><D0><C5><CD><F8><CF><B5><BD><CC><D3><FD><B2><BF><CE><A8><D2><BB><D6>
<B8><B6><A8><D1><A7><C0><FA><C8><CF><D6><A4><B2><E9><D1><AF><CD><F8><A3><AC><CD>
<F8><D6><B7>www.chsi.com.cn: NXDOMAIN
The message is in Chinese, with host names immediately preceded and
followed by text (no whitespace, as is normal with such text).
Looks like the URL parser should be less eager to include Chinese text
in URLs.
Luckily these are just warnings, but the actual question here is
what flagged the decoded text with an UTF-8 flag without checking
that it really is. If the decoding cannot be properly done, the result
should be just plain bytes, not characters.
--
You are receiving this mail because:
You are the assignee for the bug.