#9202: forms.field.URLField regexp for validating URL does not follow the RFC
------------------------+--------------------------------------------------
Reporter: niccl | Owner: nobody
Type: Bug | Status: closed
Component: Forms | Version: 1.0
Severity: Normal | Resolution: wontfix
Keywords: | Triage Stage: Design decision needed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------+--------------------------------------------------
Changes (by aaugustin):
* status: new => closed
* ui_ux: => 0
* resolution: => wontfix
* easy: => 0
Comment:
This regexp is found in Annex B of RFC 3986, which is called "Parsing a
URI Reference with a Regular Expression". It's intended to _interpret_ any
text as an URI, like the `urlparse` module.
On the other hand, the goal of the URLField is to _validate_ that the
input has a decent chance of working if you stick it in a template like
this:
{{{ <a href="{{ obj.url }}">{{ obj }}</a> }}}
Here are some examples that are happily accepted by this regexp, but
arguable aren't URLs:
{{{
>>> import re
>>> url_re =
re.compile(r'^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?$')
# NB: I added the trailing $
>>> url_re.match('abc')
<_sre.SRE_Match object at 0x1030329e0>
>>> url_re.match('A/B')
<_sre.SRE_Match object at 0x103032ad8>
>>> url_re.match('?#')
<_sre.SRE_Match object at 0x1030329e0>
>>> url_re.match('irc://irc.freenode.net/django-dev')
<_sre.SRE_Match object at 0x103032ad8>
>>> url_re.match('\\server\share')
<_sre.SRE_Match object at 0x1030329e0>
>>> url_re.match('this has nothing to do with an URL')
<_sre.SRE_Match object at 0x103032ad8>
>>> url_re.match('')
<_sre.SRE_Match object at 0x103032ad8>
>>> url_re.match('#@!?')
<_sre.SRE_Match object at 0x1030329e0>
}}}
The regexp can be decomposed as `<optional stuff>([^?#]*)<more optional
stuff>`, which makes it extremely laxist.
To be honest, I can't find a single example that won't match the regexp. I
thought the last one would fail because of the `#` before the `?`, but
somehow it's accepted.
Finally, `verify_exists` is deprecated, so comment 2 no longer applies.
So this regexp, while technically correct, isn't appropriate to validate
the contents of URLField; it's too permissive.
--
Ticket URL: <https://code.djangoproject.com/ticket/9202#comment:4>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/django-updates?hl=en.