Validation of IDN (Internationalized Domain Names) was added in [12474], but I noticed that the verify_exists option doesn't work when you use an IDN. This is caused by urllib2 not supporting IDN and the validation code using the original unicode version of the URL when testing for existence.
The problem is within URLValidator and can be fixed relatively easily by using the IDNA-encoded version of the domain when testing that the URL exists. I have a patch worked up for this and will raise a ticket shortly. However, I would like to open a discussion on Django's handling of non- ASCII URLs... Should the clean method of forms.URLField return the unicode value as entered, or an IDNA-encoded URL? What happens if a non-ASCII character is used in another part of the URL? For example, opening http://en.wikipedia.org/wiki/Café in a browser will work, but Django will not validate this as a legal URL (because it isn't). In reality, the browser is converting the URL to http://en.wikipedia.org/wiki/Caf%C3%A9 and requesting that. Perhaps Django should permit such URLs and perform the same encoding as a browser. In which case, should the clean method of forms.URLField return the unicode value as entered, or urlencoded UTF-8 version of the URL? I guess an approach to these URL "complexities" would be to introduce a utility function within django.utils.http such as: def safe_url(url): scheme, netloc, path, query, fragment = urlparse.urlsplit(url) netloc = netloc.encode('idna') path = urlquote(path) # TODO -- should query and fragement be escaped? return urlparse.urlunsplit((scheme, netloc, path, query, fragment)) This could then be used by URLValidator, but also anyone who needs to deal with non-ASCII URLs. It is probably overkill and overcomplicating things, but I had also thought about suggesting a "URL" object that would be returned by URLFields (both forms and models). This could handle unicode URLs and be responsible for encoding/decoding depending where they were used. What do people think? Fraser P.S. I noticed that #12988 has just been opened, which also relates to IDN validation. -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
