#11198: Forms URLfield regex takes infinite to validate a long field
---------------------------------------------+------------------------------
          Reporter:  marcob                  |         Owner:  nobody
            Status:  reopened                |     Milestone:        
         Component:  Forms                   |       Version:  SVN   
        Resolution:                          |      Keywords:        
             Stage:  Design decision needed  |     Has_patch:  1     
        Needs_docs:  0                       |   Needs_tests:  0     
Needs_better_patch:  0                       |  
---------------------------------------------+------------------------------
Changes (by wam):

  * component:  Uncategorized => Forms
  * needs_tests:  1 => 0

Comment:

 I believe the updated regex submitted by dc also has the same kind of
 backtracking problem that the current version has (although less severe).
 I wrote a
 simple test case (which hangs the testcase runner rather than erroring,
 but it at least demonstrates the problem) and all submissions so far still
 "fail" (hangs) the test:
 {{{
 diff --git a/tests/regressiontests/forms/fields.py
 b/tests/regressiontests/forms/fields.py
 index 9d9d722..a4980b0 100644
 --- a/tests/regressiontests/forms/fields.py
 +++ b/tests/regressiontests/forms/fields.py
 @@ -971,8 +971,12 @@ ValidationError: [u'Enter a valid URL.']
  >>> f.clean('http://example.')
  Traceback (most recent call last):
  ...
  ValidationError: [u'Enter a valid URL.']
 +>>> f.clean('http://%s' % ("X"*200,))
 +Traceback (most recent call last):
 +...
 +ValidationError: [u'Enter a valid URL.']
  >>> f.clean('http://.com')
  Traceback (most recent call last):
  ...
  ValidationError: [u'Enter a valid URL.']
 }}}

 I've got an alternative regex to match domains which seems to much better
 behaved with long invalid URLs and which passes all existing testcases for
 django.forms.URLField.clean() as well as the pathological one that I list
 above.

 {{{
 url_re = re.compile(
     r'^https?://' # http:// or https://
     r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+[A-Z]{2,6}|'
 #domain...
     r'localhost|' #localhost...
     r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})' # ...or ip
     r'(?::\d+)?' # optional port
     r'(?:/?|/\S+)$', re.IGNORECASE)
 }}}

 This version has two notable changes from the current regex and the prior
 submissions to this ticket:
   * The number of places where a regex backtrack can occur is cut from 4
 in the original (and in dc's proposal) down to 2 (with one of the
 backtracks being length limited).
   * The updated version enforces RFC 1035 (page7: sect. 2.3.1) and RFC
 2181 (page 12: sect. 11) restrictions on domain label sizes (between 1 and
 63 octects/characters) for even tighter validation

-- 
Ticket URL: <http://code.djangoproject.com/ticket/11198#comment:8>
Django <http://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/django-updates?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to