#28628: Audit for and abolish all use of '\d' in regexes
-------------------------------------+-------------------------------------
     Reporter:  James Bennett        |                    Owner:  Ad
         Type:                       |  Timmering
  Cleanup/optimization               |                   Status:  assigned
    Component:  Core (Other)         |                  Version:  dev
     Severity:  Normal               |               Resolution:
     Keywords:                       |             Triage Stage:  Accepted
    Has patch:  1                    |      Needs documentation:  0
  Needs tests:  1                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------

Comment (by Ad Timmering):

 I went through each of the use cases in the Django code - and found little
 reason/benefit to update most.

 Important background to this is that in all unicode matches for `\d`,
 `int(x)` actually properly casts the number back to a normal `int`. Most
 cases where a decimal is expected and extracted, it will be casted with
 `int(x)` so the problem goes away -- or might actually be beneficial to
 users (eg. I live in Japan where people frequently use full-width
 decimals, such as 012345 instead of 012345). Eg.
 {{{
 >>> int('\uABF9')  # MEETEI MAYEK DIGIT NINE
 9
 }}}


 Most use cases to me seem to fall in one of the below:

 a) We're processing user input which ''might'' actually be (inadvertently)
 input in non-ASCII; so result is likely desired - and the very least
 changing it could mean it's a braking change for users. ==> DON'T CHANGE

 b) Changing to a more restrictive regex seems harmless enough, but also
 doesn't add much value. Eg. when parsing a version number like "1.2.3"
 with something like `(\d)\.(\d)\.(\d)`.

 c) To me there was only one case of Django code where it might be
 beneficial to change, which is in `django.utils.http` processing of
 dates/times in HTTP headers - and the spec clearly requires ASCII digits.

 Inventory of use cases with thoughts [https://docs.google.com/document/d
 /1nc1uwTIghm-eIhiIlssNAH72KNFHoHAsL0gGlr9dRlg/edit# in this Google doc].
 Curious to thoughts of others.

-- 
Ticket URL: <https://code.djangoproject.com/ticket/28628#comment:15>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/069.a06d63f8d0c2f953c275ba41be24c875%40djangoproject.com.

Reply via email to