#18004: Django should not use `force_unicode(..., errors='replace')` when 
parsing
POST data.
-------------------------------------+-------------------------------------
     Reporter:  mrmachine            |                    Owner:  aaugustin
         Type:  Bug                  |                   Status:  closed
    Component:  HTTP handling        |                  Version:  master
     Severity:  Normal               |               Resolution:  needsinfo
     Keywords:  post data unicode    |             Triage Stage:
  utf8 encode decode transaction     |  Unreviewed
  aborted                            |      Needs documentation:  0
    Has patch:  1                    |  Patch needs improvement:  0
  Needs tests:  0                    |                    UI/UX:  0
Easy pickings:  0                    |
-------------------------------------+-------------------------------------

Comment (by kmtracey):

 In fact one way to trigger this error is to pass the invalid bytestring in
 as a raw query, for example:

 {{{
 >>> badq = 'SELECT * from "auth_user" WHERE "auth_user"."username" =
 \'\xea\x20\x20\''
 >>> from django.db import connection
 >>> cursor = connection.cursor()
 >>> cursor.execute(badq)
 Traceback (most recent call last):
   File "<console>", line 1, in <module>
   File "/home/kmtracey/.virtualenvs/abc/local/lib/python2.7/site-
 packages/django/db/backends/util.py", line 34, in execute
     return self.cursor.execute(sql, params)
   File "/home/kmtracey/.virtualenvs/abc/local/lib/python2.7/site-
 packages/django/db/backends/postgresql_psycopg2/base.py", line 44, in
 execute
     return self.cursor.execute(query, args)
 DatabaseError: invalid byte sequence for encoding "UTF8": 0xea2020
 }}}

 If instead you take the bytestring, decode it with errors=replace, and re-
 encode it as utf-8, the DB is fine with it (once you clear the error state
 on the transaction):

 {{{
 >>> cursor.execute(badq.decode('utf-8', errors='replace').encode('utf-8'))
 Traceback (most recent call last):
   File "<console>", line 1, in <module>
   File "/home/kmtracey/.virtualenvs/abc/local/lib/python2.7/site-
 packages/django/db/backends/util.py", line 34, in execute
     return self.cursor.execute(sql, params)
   File "/home/kmtracey/.virtualenvs/abc/local/lib/python2.7/site-
 packages/django/db/backends/postgresql_psycopg2/base.py", line 44, in
 execute
     return self.cursor.execute(query, args)
 DatabaseError: current transaction is aborted, commands ignored until end
 of transaction block

 >>> connection._rollback()
 >>> cursor.execute(badq.decode('utf-8', errors='replace').encode('utf-8'))
 >>>
 }}}

 That's essentially what the errors=replace on decoding post data is doing:
 ensuring that internally we work with unicode that can be encoded in utf-8
 when it needs to be rather than internally carrying around who-knows-what-
 encoded bytestrings that may or may not be safe to pass on to other
 subsystems. How the bytestring 0xea2020 is getting stuffed in the postgres
 connection is the question that needs to be answered first for this ticket
 -- I'm not at all convinced that is being caused by errors=replace on
 decode of post data.

-- 
Ticket URL: <https://code.djangoproject.com/ticket/18004#comment:17>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to