Re: Non-Roman characters in TLDs and domain names

Warren Togami Tue, 03 Nov 2009 22:17:41 -0800

On 11/04/2009 12:21 AM, Sidney Markowitz wrote:

The following examples are not correct, but it demonstrates the problem:

ASCII without decoding the domain sent as UTF-8
http://æ—¥æœ¬èªž.ãƒ†ã‚¹ãƒˆ/

ASCII without decoding the domain sent as ISO-2022-JP
http://$BF|K\8l(B.$B%F%9%H(B/


My Thunderbird only interprets the first one completely as a URL string,
the second one it ends at the pipe character, making it useless for a
spammer. The first one is clickable, but I don't see that Firefox, at

My point was lost here. I pasted these URL's as an example of what thespamassassin URI parser might see without decoding. The above twoexamples are http://日本語.テスト/ in two common encodings of Japanesee-mail. Since they are not decoded by spamassassin, they might becometwo different punycode strings and two different URIBL lookups. This iswhy we may need to always decode before punycode encoding.


Can you show me the equivalent for the following URL, which is a real
site? That way we can easily answer the question "If the MUA makes it a
hot link, is it a link that works?"

Clickable link today is not relevant. MUA and browsers in the futurewill adapt to support these international TLD's. Prominent clients likeThunderbird and gmail today already make them clickable. I suspect theother clients don't make them clickable today only because they areunknown TLD's or they don't recognize non-ascii domains as valid URI's.Yet.


Warren Togami
[email protected]

Re: Non-Roman characters in TLDs and domain names

Reply via email to