четверг, 7 августа 2014 г., 10:11:17 UTC+4 пользователь Graham Dumpleton
написал:
>
> I realised later what your problem is.
>
> Technically the wsgiref server is likely broken when it comes to
> generating and passing through CGI param values.
>
> For QUERY_STRING, when sent by a browser, it is supposed to be sent %
> encoded as the value is meant to be ASCII only.
>
QUERY_STRING contains only ASCII symbols: name=%D0%98%D0%B2%D0%B0%D0%BD
>
> That ASCII percent encoded value, because the encoding can only be known
> by the application, is supposed to get all the way through to the
> application. You are then meant to deal with it like:
>
> >>> dir(urllib)
> ['ContentTooShortError', 'FancyURLopener', 'MAXFTPCACHE', 'URLopener',
> '__all__', '__builtins__', '__doc__', '__file__', '__name__',
> '__package__', '__version__', '_ftperrors', '_get_proxies',
> '_get_proxy_settings', '_have_ssl', '_hexdig', '_hextochr', '_hostprog',
> '_is_unicode', '_localhost', '_noheaders', '_nportprog', '_passwdprog',
> '_portprog', '_queryprog', '_safe_map', '_safe_quoters', '_tagprog',
> '_thishost', '_typeprog', '_urlopener', '_userprog', '_valueprog',
> 'addbase', 'addclosehook', 'addinfo', 'addinfourl', 'always_safe',
> 'basejoin', 'c', 'ftpcache', 'ftperrors', 'ftpwrapper', 'getproxies',
> 'getproxies_environment', 'getproxies_macosx_sysconf', 'i', 'localhost',
> 'main', 'noheaders', 'os', 'pathname2url', 'proxy_bypass',
> 'proxy_bypass_environment', 'proxy_bypass_macosx_sysconf', 'quote',
> 'quote_plus', 'reporthook', 'socket', 'splitattr', 'splithost',
> 'splitnport', 'splitpasswd', 'splitport', 'splitquery', 'splittag',
> 'splittype', 'splituser', 'splitvalue', 'ssl', 'string', 'sys', 'test',
> 'test1', 'thishost', 'time', 'toBytes', 'unquote', 'unquote_plus',
> 'unwrap', 'url2pathname', 'urlcleanup', 'urlencode', 'urlopen',
> 'urlretrieve']
>
> >>> urllib.unquote('%D0%98%D0%B2%D0%B0%D0%BD')
> '\xd0\x98\xd0\xb2\xd0\xb0\xd0\xbd'
>
> >>> urllib.unquote('%D0%98%D0%B2%D0%B0%D0%BD').decode('UTF-8')
> u'\u0418\u0432\u0430\u043d'
>
> >>> print urllib.unquote('%D0%98%D0%B2%D0%B0%D0%BD').decode('UTF-8')
> Иван
>
I know, but python3.2 doesn't have urllib.unquote. Instead I should use
urlllib.parse.unquote. This function has parameter encoding which has
default value UTF-8. And pyhton 3 string doesn't have a method called
decode. Therefore I can only use this encoding parameter, but it already
have proper value. And I write the following (in my application function):
print("name=%D0%98%D0%B2%D0%B0%D0%BD")
print(urllib.parse.unquote("name=%D0%98%D0%B2%D0%B0%D0%BD"))
The output when started from wsgiref:
name=%D0%98%D0%B2%D0%B0%D0%BD
name=Иван
And from mod_wsgi:
name=%D0%98%D0%B2%D0%B0%D0%BD
name=\xd0\x98\xd0\xb2\xd0\xb0\xd0\xbd
It seems that some environment variables may affect it but I haven't found
any mention about it in
https://docs.python.org/3.2/library/urllib.parse.html
>
> So technically your WSGI application doesn't conform to the WSGI
> specification. That it wasn't was hidden by possible issues with the
> wsgiref server implementation incorrectly passing through a byte string
> rather than the % encode string. That or the HTTP client used was
> incorrectly passing through UTF-8 rather than % encoding as technically
> only your application can know what encoding it should be.
>
Unfortunately, I have the same problem if just hard code this string in my
code (see above). So the problem isn't in passing the % encode string.
>
> End result, is that it is up to you to decode value with the correct
> encoding.
>
> >>> dir(cgi)
> ['FieldStorage', 'FormContent', 'FormContentDict',
> 'InterpFormContentDict', 'MiniFieldStorage', 'StringIO',
> 'SvFormContentDict', 'UserDict', '__all__', '__builtins__', '__doc__',
> '__file__', '__name__', '__package__', '__version__', '_parseparam',
> 'attrgetter', 'catch_warnings', 'dolog', 'escape', 'filterwarnings',
> 'initlog', 'log', 'logfile', 'logfp', 'maxlen', 'mimetools', 'nolog', 'os',
> 'parse', 'parse_header', 'parse_multipart', 'parse_qs', 'parse_qsl',
> 'print_arguments', 'print_directory', 'print_environ',
> 'print_environ_usage', 'print_exception', 'print_form', 'rfc822', 'sys',
> 'test', 'urllib', 'urlparse', 'valid_boundary', 'warn']
>
> >>> cgi.parse_qs('name=%D0%98%D0%B2%D0%B0%D0%BD')
> {'name': ['\xd0\x98\xd0\xb2\xd0\xb0\xd0\xbd']}
>
> >>> cgi.parse_qs('name=%D0%98%D0%B2%D0%B0%D0%BD')['name']
> ['\xd0\x98\xd0\xb2\xd0\xb0\xd0\xbd']
>
> >>>
> cgi.parse_qs('name=%D0%98%D0%B2%D0%B0%D0%BD')['name'][0].decode('UTF-8')
> u'\u0418\u0432\u0430\u043d'
>
> >>> print
> cgi.parse_qs('name=%D0%98%D0%B2%D0%B0%D0%BD')['name'][0].decode('UTF-8')
> Иван
>
https://docs.python.org/3.2/library/cgi.html tells me that I should use
urllib.parse.parse_qs as soon as cgi.parse_qs is deprecated in Python 3.2
--
You received this message because you are subscribed to the Google Groups
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.