On 07/08/2014, at 5:03 PM, Dmitriy Chugunov <[email protected]> wrote:

> 
> 
> четверг, 7 августа 2014 г., 10:11:17 UTC+4 пользователь Graham Dumpleton 
> написал:
> I realised later what your problem is.
> 
> Technically the wsgiref server is likely broken when it comes to generating 
> and passing through CGI param values.
> 
> For QUERY_STRING, when sent by a browser, it is supposed to be sent % encoded 
> as the value is meant to be ASCII only.
> 
> QUERY_STRING contains only ASCII symbols: name=%D0%98%D0%B2%D0%B0%D0%BD
>  
> 
> That ASCII percent encoded value, because the encoding can only be known by 
> the application, is supposed to get all the way through to the application. 
> You are then meant to deal with it like:
> 
> >>> dir(urllib)
> ['ContentTooShortError', 'FancyURLopener', 'MAXFTPCACHE', 'URLopener', 
> '__all__', '__builtins__', '__doc__', '__file__', '__name__', '__package__', 
> '__version__', '_ftperrors', '_get_proxies', '_get_proxy_settings', 
> '_have_ssl', '_hexdig', '_hextochr', '_hostprog', '_is_unicode', 
> '_localhost', '_noheaders', '_nportprog', '_passwdprog', '_portprog', 
> '_queryprog', '_safe_map', '_safe_quoters', '_tagprog', '_thishost', 
> '_typeprog', '_urlopener', '_userprog', '_valueprog', 'addbase', 
> 'addclosehook', 'addinfo', 'addinfourl', 'always_safe', 'basejoin', 'c', 
> 'ftpcache', 'ftperrors', 'ftpwrapper', 'getproxies', 
> 'getproxies_environment', 'getproxies_macosx_sysconf', 'i', 'localhost', 
> 'main', 'noheaders', 'os', 'pathname2url', 'proxy_bypass', 
> 'proxy_bypass_environment', 'proxy_bypass_macosx_sysconf', 'quote', 
> 'quote_plus', 'reporthook', 'socket', 'splitattr', 'splithost', 'splitnport', 
> 'splitpasswd', 'splitport', 'splitquery', 'splittag', 'splittype', 
> 'splituser', 'splitvalue', 'ssl', 'string', 'sys', 'test', 'test1', 
> 'thishost', 'time', 'toBytes', 'unquote', 'unquote_plus', 'unwrap', 
> 'url2pathname', 'urlcleanup', 'urlencode', 'urlopen', 'urlretrieve']
> 
> >>> urllib.unquote('%D0%98%D0%B2%D0%B0%D0%BD')
> '\xd0\x98\xd0\xb2\xd0\xb0\xd0\xbd'
> 
> >>> urllib.unquote('%D0%98%D0%B2%D0%B0%D0%BD').decode('UTF-8')
> u'\u0418\u0432\u0430\u043d'
> 
> >>> print urllib.unquote('%D0%98%D0%B2%D0%B0%D0%BD').decode('UTF-8')
> Иван
> 
> I know, but python3.2 doesn't have urllib.unquote. Instead I should use 
> urlllib.parse.unquote. This function has parameter encoding which has default 
> value UTF-8. And pyhton 3 string doesn't have a method called decode. 
> Therefore I can only use this encoding parameter, but it already have proper 
> value. And I write the following (in my application function): 
>         print("name=%D0%98%D0%B2%D0%B0%D0%BD")
>         print(urllib.parse.unquote("name=%D0%98%D0%B2%D0%B0%D0%BD"))
> The output when started from wsgiref:
>         name=%D0%98%D0%B2%D0%B0%D0%BD
>         name=Иван
> And from mod_wsgi:
>         name=%D0%98%D0%B2%D0%B0%D0%BD
>         name=\xd0\x98\xd0\xb2\xd0\xb0\xd0\xbd
> It seems that some environment variables may affect it but I haven't found 
> any mention about it in https://docs.python.org/3.2/library/urllib.parse.html

In Python 3 it is horrible as you need to do a dance as the although it comes 
through as a Unicode string, it is supposed to be as the byte value string as 
Latin-1. This is meant you are supposed to convert it back to a byte string as 
Latin-1 and then back to Unicode as UTF-8.

s = urllib.parse.unquote('%D0%98%D0%B2%D0%B0%D0%BD', 
encoding='Latin-1').encode('Latin-1')
>>> s
b'\xd0\x98\xd0\xb2\xd0\xb0\xd0\xbd'
>>> s.decode('UTF-8')
'Иван'

This is a particularly horrible area for WSGI under Python 3. It is why people 
always recommend you should do this yourself and should use at least a micro 
framework such as Flask as it hides all these crappy conversions.

What happens if you do that dance?

Graham


>  
> 
> So technically your WSGI application doesn't conform to the WSGI 
> specification. That it wasn't was hidden by possible issues with the wsgiref 
> server implementation incorrectly passing through a byte string rather than 
> the % encode string. That or the HTTP client used was incorrectly passing 
> through UTF-8 rather than % encoding as technically only your application can 
> know what encoding it should be.
> 
> Unfortunately, I have the same problem if just hard code this string in my 
> code (see above). So the problem isn't in passing the % encode string.
> 
>  
> 
> End result, is that it is up to you to decode value with the correct encoding.
> 
> >>> dir(cgi)
> ['FieldStorage', 'FormContent', 'FormContentDict', 'InterpFormContentDict', 
> 'MiniFieldStorage', 'StringIO', 'SvFormContentDict', 'UserDict', '__all__', 
> '__builtins__', '__doc__', '__file__', '__name__', '__package__', 
> '__version__', '_parseparam', 'attrgetter', 'catch_warnings', 'dolog', 
> 'escape', 'filterwarnings', 'initlog', 'log', 'logfile', 'logfp', 'maxlen', 
> 'mimetools', 'nolog', 'os', 'parse', 'parse_header', 'parse_multipart', 
> 'parse_qs', 'parse_qsl', 'print_arguments', 'print_directory', 
> 'print_environ', 'print_environ_usage', 'print_exception', 'print_form', 
> 'rfc822', 'sys', 'test', 'urllib', 'urlparse', 'valid_boundary', 'warn']
> 
> >>> cgi.parse_qs('name=%D0%98%D0%B2%D0%B0%D0%BD')
> {'name': ['\xd0\x98\xd0\xb2\xd0\xb0\xd0\xbd']}
> 
> >>> cgi.parse_qs('name=%D0%98%D0%B2%D0%B0%D0%BD')['name']
> ['\xd0\x98\xd0\xb2\xd0\xb0\xd0\xbd']
> 
> >>> cgi.parse_qs('name=%D0%98%D0%B2%D0%B0%D0%BD')['name'][0].decode('UTF-8')
> u'\u0418\u0432\u0430\u043d'
> 
> >>> print 
> >>> cgi.parse_qs('name=%D0%98%D0%B2%D0%B0%D0%BD')['name'][0].decode('UTF-8')
> Иван
> 
> https://docs.python.org/3.2/library/cgi.html tells me that I should use 
> urllib.parse.parse_qs as soon as cgi.parse_qs is deprecated in Python 3.2 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "modwsgi" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/modwsgi.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.

Reply via email to