Validation of IDN (Internationalized Domain Names) was added in
[12474], but I noticed that the verify_exists option doesn't work when
you use an IDN. This is caused by urllib2 not supporting IDN and the
validation code using the original unicode version of the URL when
testing for existence.

The problem is within URLValidator and can be fixed relatively easily
by using the IDNA-encoded version of the domain when testing that the
URL exists. I have a patch worked up for this and will raise a ticket
shortly.

However, I would like to open a discussion on Django's handling of non-
ASCII URLs...

Should the clean method of forms.URLField return the unicode value as
entered, or an IDNA-encoded URL?

What happens if a non-ASCII character is used in another part of the
URL? For example, opening http://en.wikipedia.org/wiki/Café in a
browser will work, but Django will not validate this as a legal URL
(because it isn't). In reality, the browser is converting the URL to
http://en.wikipedia.org/wiki/Caf%C3%A9 and requesting that.

Perhaps Django should permit such URLs and perform the same encoding
as a browser. In which case, should the clean method of forms.URLField
return the unicode value as entered, or urlencoded UTF-8 version of
the URL?

I guess an approach to these URL "complexities" would be to introduce
a utility function within django.utils.http such as:

def safe_url(url):
    scheme, netloc, path, query, fragment = urlparse.urlsplit(url)
    netloc = netloc.encode('idna')
    path = urlquote(path)
    # TODO -- should query and fragement be escaped?
    return urlparse.urlunsplit((scheme, netloc, path, query,
fragment))

This could then be used by URLValidator, but also anyone who needs to
deal with non-ASCII URLs.

It is probably overkill and overcomplicating things, but I had also
thought about suggesting a "URL" object that would be returned by
URLFields (both forms and models). This could handle unicode URLs and
be responsible for encoding/decoding depending where they were used.

What do people think?

Fraser

P.S. I noticed that #12988 has just been opened, which also relates to
IDN validation.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Reply via email to