#18004: Django should not use `force_unicode(..., errors='replace')` when 
parsing
POST data.
-------------------------------------+-------------------------------------
     Reporter:  mrmachine            |                    Owner:  aaugustin
         Type:  Bug                  |                   Status:  assigned
    Component:  HTTP handling        |                  Version:  master
     Severity:  Normal               |               Resolution:
     Keywords:  post data unicode    |             Triage Stage:
  utf8 encode decode transaction     |  Unreviewed
  aborted                            |      Needs documentation:  0
    Has patch:  1                    |  Patch needs improvement:  0
  Needs tests:  0                    |                    UI/UX:  0
Easy pickings:  0                    |
-------------------------------------+-------------------------------------
Changes (by aaugustin):

 * stage:  Design decision needed => Unreviewed


Comment:

 Yes, I have strong objections to your proposal: I'm not going to add a
 workaround for a problem that we haven't identified yet.

 All of the above is vague, and we still don't know how to trigger this
 error.

 ----

 I did the research, and RFC 1867 says that file names must be encoded:

 > The client application should make best
 > effort to supply the file name; if the file name of the client's
 > operating system is not in US-ASCII, the file name might be
 > approximated or encoded using the method of RFC 1522.


 This is repeated in section 5.11 - Non-ASCII field names:

 > Note that mime headers are generally required to consist only of 7-
 > bit data in the US-ASCII character set. Hence field names should be
 > encoded according to the prescriptions of RFC 1522 if they contain
 > characters outside of that set. In HTML 2.0, the default character
 > set is ISO-8859-1, but non-ASCII characters in field names should be
 > encoded.


 RFC 1522 describes mime-encoding, and this encoding explicitly includes
 the charset.

 ----

 Note that you're the only person to have ever hit this problem; for all I
 know this could be a bug in your code. The only way to be sure is to log a
 request, and figure out why Django can't parse it.

 Here's what I would suggest: in the problematic view, catch the
 `DatabaseError`, and when it occurs, dump `request.body` in a file in
 binary mode. Once we have this file, we can figure out why Django ends up
 with invalid utf8 data.

-- 
Ticket URL: <https://code.djangoproject.com/ticket/18004#comment:14>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to