Re: [Python-Dev] bytes / unicode

Terry Reedy Mon, 21 Jun 2010 10:40:23 -0700

On 6/20/2010 11:56 PM, Terry Reedy wrote:

The specific example is


 >>> urllib.parse.parse_qsl('a=b%e0')
[('a', 'b�')]

where the character after 'b' is white ? in dark diamond, indicating an
error.

parse_qsl() splits that input on '=' and sends each piece to
urllib.parse.unquote
unquote() attempts to "Replace %xx escapes by their single-character
equivalent.". unquote has an encoding parameter that defaults to 'utf-8'
in *its* call to .decode. parse_qsl does not have an encoding parameter.
If it did, and it passed that to unquote, then
the above example would become (simulated interaction)

 >>> urllib.parse.parse_qsl('a=b%e0', encoding='latin-1')
[('a', 'bà')]

I got that output by copying the file and adding "encoding-'latin-1'" to
the unquote call.

Does this solve this problem?
Has anything like this been added for 3.2?
Should it be?


With a little searching, I found
http://bugs.python.org/issue5468

with Miles Kaufmann's year-old comment "parse_qs and parse_qsl shouldalso grow encoding and errors parameters to pass to the underlyingunquote()". Patch review is needed.


Terry Jan Reedy


_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes / unicode

Reply via email to