#18004: Django should not use `force_unicode(..., errors='replace')` when
parsing
POST data.
-------------------------------------+-------------------------------------
Reporter: mrmachine | Owner: aaugustin
Type: Bug | Status: closed
Component: HTTP handling | Version: master
Severity: Normal | Resolution: needsinfo
Keywords: post data unicode | Triage Stage:
utf8 encode decode transaction | Unreviewed
aborted | Needs documentation: 0
Has patch: 1 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Comment (by kmtracey):
In fact one way to trigger this error is to pass the invalid bytestring in
as a raw query, for example:
{{{
>>> badq = 'SELECT * from "auth_user" WHERE "auth_user"."username" =
\'\xea\x20\x20\''
>>> from django.db import connection
>>> cursor = connection.cursor()
>>> cursor.execute(badq)
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/home/kmtracey/.virtualenvs/abc/local/lib/python2.7/site-
packages/django/db/backends/util.py", line 34, in execute
return self.cursor.execute(sql, params)
File "/home/kmtracey/.virtualenvs/abc/local/lib/python2.7/site-
packages/django/db/backends/postgresql_psycopg2/base.py", line 44, in
execute
return self.cursor.execute(query, args)
DatabaseError: invalid byte sequence for encoding "UTF8": 0xea2020
}}}
If instead you take the bytestring, decode it with errors=replace, and re-
encode it as utf-8, the DB is fine with it (once you clear the error state
on the transaction):
{{{
>>> cursor.execute(badq.decode('utf-8', errors='replace').encode('utf-8'))
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/home/kmtracey/.virtualenvs/abc/local/lib/python2.7/site-
packages/django/db/backends/util.py", line 34, in execute
return self.cursor.execute(sql, params)
File "/home/kmtracey/.virtualenvs/abc/local/lib/python2.7/site-
packages/django/db/backends/postgresql_psycopg2/base.py", line 44, in
execute
return self.cursor.execute(query, args)
DatabaseError: current transaction is aborted, commands ignored until end
of transaction block
>>> connection._rollback()
>>> cursor.execute(badq.decode('utf-8', errors='replace').encode('utf-8'))
>>>
}}}
That's essentially what the errors=replace on decoding post data is doing:
ensuring that internally we work with unicode that can be encoded in utf-8
when it needs to be rather than internally carrying around who-knows-what-
encoded bytestrings that may or may not be safe to pass on to other
subsystems. How the bytestring 0xea2020 is getting stuffed in the postgres
connection is the question that needs to be answered first for this ticket
-- I'm not at all convinced that is being caused by errors=replace on
decode of post data.
--
Ticket URL: <https://code.djangoproject.com/ticket/18004#comment:17>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.