http://bugzilla.spamassassin.org/show_bug.cgi?id=4522





------- Additional Comments From [EMAIL PROTECTED]  2005-09-13 14:23 -------
I asked on the dev mailing list for feedback about the behavior of different
MUAs in parsing a URI containing a JIS escape sequence in the host/domain name.
The example I gave used a host name of the form spammer.comJISSTUFF.com which
would produce two different results depending on how JISSTUFF is parsed.

If common MUAs make a hot link out of spammer.com, then we have to extract
"spammer.com" for URIRBL checking even if the RFCs say thatthe URI is supposed
to be interpreted as spammer.comJISSTUFF.com. On the other hand, given that
people who use character codes such as JIS make use of domain names with
extended characters and their MUAs have to recognze JIS, we have to also find
the correct host and domain name in that URI.

I think eventually we will have to do something to recognize the character
encoding of the email and use that in parsing the URIs while at the same time
emulating the behavior of the MUAs that don't handle the extended character
sets. This is more than just a parsing issue: It gets into the implementation of
IDN characters in domain names, which is handled by DNS by encoding the extended
character set names into a canonical 7-bit form. To do this right with URIRBLs
we will have to do IDN canonicalization.

I therefore propose that we use Justin's patch for 3.1 and retarget the bug for
a later release in which we do it right. That proposal is based on the results
of my survey which show that the most common MUAs do break the URI when they see
the JIS escape sequence. Note that the patch will break the ability to have JIS
encoded domain names in URIRBLs. But it will allow us to do the right thing for
the most common case, which is what the spam is targeting, and it will not break
anything that we can handle now given that we aren't doing IDN canonicalization
of the IDN characters that we do see.

Here are my results so far

MUAs that hot link the complete URI, with the JIS characters:

Thunderbird
Mutt (contradictory reports)
vim (mail syntax)
Earthlink WebMail in text only mode

MUAs that hotlink just to the JIS escape:

Outlook
Outlook Express
Outlook web client (Exchange)
(Apple) Mail 1.3.11 (v622) (Mac OS X)
Mulberry 3.1.6
Mulberry 4.0.3
Mutt 1.5.9i (2005-03-13) (contradictory reports)
sylpheed-claws 1.0.4
Evolution





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to