https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6959

            Bug ID: 6959
           Summary: Malformed UTF-8 character - in transliteration at
                    DnsResolver.pm line  627
           Product: Spamassassin
           Version: 3.4 SVN branch
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Libraries
          Assignee: [email protected]
          Reporter: [email protected]

This was reported by Axb on the mailing list:

| Malformed UTF-8 character (unexpected non-continuation byte 0x6e,
| immediately after start byte 0xf6) in transliteration (tr///)
| at /data/masscheckwork/weekly_mass_check/masses/../lib/Mail/SpamAssassin
| /DnsResolver.pm line 627.

I was able to reproduce this from a command-line spamassassin in
debug mode, after disabling the config option 'dns_options dns0x20'.
The setting of normalize_charset had no effect of the outcome.

Here is a pair of the indicative debug entries:

[52977] warn: Malformed UTF-8 character (unexpected non-cont
inuation byte 0xd6, immediately after start byte 0xf8) in transliteration
(tr///
) at /usr/local/lib/perl5/site_perl/5.18.0/Mail/SpamAssassin/DnsResolver.pm
line
 627, <GEN0> line 9.

[52977] dbg: dns: dns reply to 3798/IN/A/www.moe.gov.cn<A3>
<AC><D1><A7><D0><C5><CD><F8><CF><B5><BD><CC><D3><FD><B2><BF><CE><A8><D2><BB><D6>
<B8><B6><A8><D1><A7><C0><FA><C8><CF><D6><A4><B2><E9><D1><AF><CD><F8><A3><AC><CD>
<F8><D6><B7>www.chsi.com.cn: NXDOMAIN

The message is in Chinese, with host names immediately preceded and
followed by text (no whitespace, as is normal with such text).

Looks like the URL parser should be less eager to include Chinese text
in URLs.

Luckily these are just warnings, but the actual question here is
what flagged the decoded text with an UTF-8 flag without checking
that it really is. If the decoding cannot be properly done, the result
should be just plain bytes, not characters.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to