On 11/04/2009 12:21 AM, Sidney Markowitz wrote:

The following examples are not correct, but it demonstrates the problem:

ASCII without decoding the domain sent as UTF-8
http://日本語.テスト/

ASCII without decoding the domain sent as ISO-2022-JP
http://$BF|K\8l(B.$B%F%9%H(B/

My Thunderbird only interprets the first one completely as a URL string,
the second one it ends at the pipe character, making it useless for a
spammer. The first one is clickable, but I don't see that Firefox, at

My point was lost here. I pasted these URL's as an example of what the spamassassin URI parser might see without decoding. The above two examples are http://日本語.テスト/ in two common encodings of Japanese e-mail. Since they are not decoded by spamassassin, they might become two different punycode strings and two different URIBL lookups. This is why we may need to always decode before punycode encoding.


Can you show me the equivalent for the following URL, which is a real
site? That way we can easily answer the question "If the MUA makes it a
hot link, is it a link that works?"

Clickable link today is not relevant. MUA and browsers in the future will adapt to support these international TLD's. Prominent clients like Thunderbird and gmail today already make them clickable. I suspect the other clients don't make them clickable today only because they are unknown TLD's or they don't recognize non-ascii domains as valid URI's. Yet.

Warren Togami
[email protected]

Reply via email to