And I of course meant to say 'why you should not do this yourself'. On 07/08/2014, at 5:16 PM, Graham Dumpleton <[email protected]> wrote:
> > On 07/08/2014, at 5:03 PM, Dmitriy Chugunov <[email protected]> wrote: > >> >> >> четверг, 7 августа 2014 г., 10:11:17 UTC+4 пользователь Graham Dumpleton >> написал: >> I realised later what your problem is. >> >> Technically the wsgiref server is likely broken when it comes to generating >> and passing through CGI param values. >> >> For QUERY_STRING, when sent by a browser, it is supposed to be sent % >> encoded as the value is meant to be ASCII only. >> >> QUERY_STRING contains only ASCII symbols: name=%D0%98%D0%B2%D0%B0%D0%BD >> >> >> That ASCII percent encoded value, because the encoding can only be known by >> the application, is supposed to get all the way through to the application. >> You are then meant to deal with it like: >> >> >>> dir(urllib) >> ['ContentTooShortError', 'FancyURLopener', 'MAXFTPCACHE', 'URLopener', >> '__all__', '__builtins__', '__doc__', '__file__', '__name__', '__package__', >> '__version__', '_ftperrors', '_get_proxies', '_get_proxy_settings', >> '_have_ssl', '_hexdig', '_hextochr', '_hostprog', '_is_unicode', >> '_localhost', '_noheaders', '_nportprog', '_passwdprog', '_portprog', >> '_queryprog', '_safe_map', '_safe_quoters', '_tagprog', '_thishost', >> '_typeprog', '_urlopener', '_userprog', '_valueprog', 'addbase', >> 'addclosehook', 'addinfo', 'addinfourl', 'always_safe', 'basejoin', 'c', >> 'ftpcache', 'ftperrors', 'ftpwrapper', 'getproxies', >> 'getproxies_environment', 'getproxies_macosx_sysconf', 'i', 'localhost', >> 'main', 'noheaders', 'os', 'pathname2url', 'proxy_bypass', >> 'proxy_bypass_environment', 'proxy_bypass_macosx_sysconf', 'quote', >> 'quote_plus', 'reporthook', 'socket', 'splitattr', 'splithost', >> 'splitnport', 'splitpasswd', 'splitport', 'splitquery', 'splittag', >> 'splittype', 'splituser', 'splitvalue', 'ssl', 'string', 'sys', 'test', >> 'test1', 'thishost', 'time', 'toBytes', 'unquote', 'unquote_plus', 'unwrap', >> 'url2pathname', 'urlcleanup', 'urlencode', 'urlopen', 'urlretrieve'] >> >> >>> urllib.unquote('%D0%98%D0%B2%D0%B0%D0%BD') >> '\xd0\x98\xd0\xb2\xd0\xb0\xd0\xbd' >> >> >>> urllib.unquote('%D0%98%D0%B2%D0%B0%D0%BD').decode('UTF-8') >> u'\u0418\u0432\u0430\u043d' >> >> >>> print urllib.unquote('%D0%98%D0%B2%D0%B0%D0%BD').decode('UTF-8') >> Иван >> >> I know, but python3.2 doesn't have urllib.unquote. Instead I should use >> urlllib.parse.unquote. This function has parameter encoding which has >> default value UTF-8. And pyhton 3 string doesn't have a method called >> decode. Therefore I can only use this encoding parameter, but it already >> have proper value. And I write the following (in my application function): >> print("name=%D0%98%D0%B2%D0%B0%D0%BD") >> print(urllib.parse.unquote("name=%D0%98%D0%B2%D0%B0%D0%BD")) >> The output when started from wsgiref: >> name=%D0%98%D0%B2%D0%B0%D0%BD >> name=Иван >> And from mod_wsgi: >> name=%D0%98%D0%B2%D0%B0%D0%BD >> name=\xd0\x98\xd0\xb2\xd0\xb0\xd0\xbd >> It seems that some environment variables may affect it but I haven't found >> any mention about it in https://docs.python.org/3.2/library/urllib.parse.html > > In Python 3 it is horrible as you need to do a dance as the although it comes > through as a Unicode string, it is supposed to be as the byte value string as > Latin-1. This is meant you are supposed to convert it back to a byte string > as Latin-1 and then back to Unicode as UTF-8. > > s = urllib.parse.unquote('%D0%98%D0%B2%D0%B0%D0%BD', > encoding='Latin-1').encode('Latin-1') > >>> s > b'\xd0\x98\xd0\xb2\xd0\xb0\xd0\xbd' > >>> s.decode('UTF-8') > 'Иван' > > This is a particularly horrible area for WSGI under Python 3. It is why > people always recommend you should do this yourself and should use at least a > micro framework such as Flask as it hides all these crappy conversions. > > What happens if you do that dance? > > Graham > > >> >> >> So technically your WSGI application doesn't conform to the WSGI >> specification. That it wasn't was hidden by possible issues with the wsgiref >> server implementation incorrectly passing through a byte string rather than >> the % encode string. That or the HTTP client used was incorrectly passing >> through UTF-8 rather than % encoding as technically only your application >> can know what encoding it should be. >> >> Unfortunately, I have the same problem if just hard code this string in my >> code (see above). So the problem isn't in passing the % encode string. >> >> >> >> End result, is that it is up to you to decode value with the correct >> encoding. >> >> >>> dir(cgi) >> ['FieldStorage', 'FormContent', 'FormContentDict', 'InterpFormContentDict', >> 'MiniFieldStorage', 'StringIO', 'SvFormContentDict', 'UserDict', '__all__', >> '__builtins__', '__doc__', '__file__', '__name__', '__package__', >> '__version__', '_parseparam', 'attrgetter', 'catch_warnings', 'dolog', >> 'escape', 'filterwarnings', 'initlog', 'log', 'logfile', 'logfp', 'maxlen', >> 'mimetools', 'nolog', 'os', 'parse', 'parse_header', 'parse_multipart', >> 'parse_qs', 'parse_qsl', 'print_arguments', 'print_directory', >> 'print_environ', 'print_environ_usage', 'print_exception', 'print_form', >> 'rfc822', 'sys', 'test', 'urllib', 'urlparse', 'valid_boundary', 'warn'] >> >> >>> cgi.parse_qs('name=%D0%98%D0%B2%D0%B0%D0%BD') >> {'name': ['\xd0\x98\xd0\xb2\xd0\xb0\xd0\xbd']} >> >> >>> cgi.parse_qs('name=%D0%98%D0%B2%D0%B0%D0%BD')['name'] >> ['\xd0\x98\xd0\xb2\xd0\xb0\xd0\xbd'] >> >> >>> cgi.parse_qs('name=%D0%98%D0%B2%D0%B0%D0%BD')['name'][0].decode('UTF-8') >> u'\u0418\u0432\u0430\u043d' >> >> >>> print >> >>> cgi.parse_qs('name=%D0%98%D0%B2%D0%B0%D0%BD')['name'][0].decode('UTF-8') >> Иван >> >> https://docs.python.org/3.2/library/cgi.html tells me that I should use >> urllib.parse.parse_qs as soon as cgi.parse_qs is deprecated in Python 3.2 >> >> -- >> You received this message because you are subscribed to the Google Groups >> "modwsgi" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at http://groups.google.com/group/modwsgi. >> For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "modwsgi" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/modwsgi. For more options, visit https://groups.google.com/d/optout.
