#9202: forms.field.URLField regexp for validating URL does not follow the RFC
------------------------+--------------------------------------------------
     Reporter:  niccl   |                    Owner:  nobody
         Type:  Bug     |                   Status:  closed
    Component:  Forms   |                  Version:  1.0
     Severity:  Normal  |               Resolution:  wontfix
     Keywords:          |             Triage Stage:  Design decision needed
    Has patch:  1       |      Needs documentation:  0
  Needs tests:  0       |  Patch needs improvement:  0
Easy pickings:  0       |                    UI/UX:  0
------------------------+--------------------------------------------------
Changes (by aaugustin):

 * status:  new => closed
 * ui_ux:   => 0
 * resolution:   => wontfix
 * easy:   => 0


Comment:

 This regexp is found in Annex B of RFC 3986, which is called "Parsing a
 URI Reference with a Regular Expression". It's intended to _interpret_ any
 text as an URI, like the `urlparse` module.

 On the other hand, the goal of the URLField is to _validate_ that the
 input has a decent chance of working if you stick it in a template like
 this:

 {{{ <a href="{{ obj.url }}">{{ obj }}</a> }}}

 Here are some examples that are happily accepted by this regexp, but
 arguable aren't URLs:
 {{{
 >>> import re
 >>> url_re =
 re.compile(r'^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?$')
 # NB: I added the trailing $
 >>> url_re.match('abc')
 <_sre.SRE_Match object at 0x1030329e0>
 >>> url_re.match('A/B')
 <_sre.SRE_Match object at 0x103032ad8>
 >>> url_re.match('?#')
 <_sre.SRE_Match object at 0x1030329e0>
 >>> url_re.match('irc://irc.freenode.net/django-dev')
 <_sre.SRE_Match object at 0x103032ad8>
 >>> url_re.match('\\server\share')
 <_sre.SRE_Match object at 0x1030329e0>
 >>> url_re.match('this has nothing to do with an URL')
 <_sre.SRE_Match object at 0x103032ad8>
 >>> url_re.match('')
 <_sre.SRE_Match object at 0x103032ad8>
 >>> url_re.match('#@!?')
 <_sre.SRE_Match object at 0x1030329e0>
 }}}

 The regexp can be decomposed as `<optional stuff>([^?#]*)<more optional
 stuff>`, which makes it extremely laxist.

 To be honest, I can't find a single example that won't match the regexp. I
 thought the last one would fail because of the `#` before the `?`, but
 somehow it's accepted.

 Finally, `verify_exists` is deprecated, so comment 2 no longer applies.

 So this regexp, while technically correct, isn't appropriate to validate
 the contents of URLField; it's too permissive.

-- 
Ticket URL: <https://code.djangoproject.com/ticket/9202#comment:4>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/django-updates?hl=en.

Reply via email to