Re: [modwsgi] Why does parse_qs give different result depending on the way wsgi application has been started?

Dmitriy Chugunov Thu, 07 Aug 2014 00:04:28 -0700


четверг, 7 августа 2014 г., 10:11:17 UTC+4 пользователь Graham Dumpleton 
написал:
>
> I realised later what your problem is.
>
> Technically the wsgiref server is likely broken when it comes to 
> generating and passing through CGI param values.
>
> For QUERY_STRING, when sent by a browser, it is supposed to be sent % 
> encoded as the value is meant to be ASCII only.
>


QUERY_STRING contains only ASCII symbols: name=%D0%98%D0%B2%D0%B0%D0%BD
 

>
> That ASCII percent encoded value, because the encoding can only be known 
> by the application, is supposed to get all the way through to the 
> application. You are then meant to deal with it like:
>
> >>> dir(urllib)
> ['ContentTooShortError', 'FancyURLopener', 'MAXFTPCACHE', 'URLopener', 
> '__all__', '__builtins__', '__doc__', '__file__', '__name__', 
> '__package__', '__version__', '_ftperrors', '_get_proxies', 
> '_get_proxy_settings', '_have_ssl', '_hexdig', '_hextochr', '_hostprog', 
> '_is_unicode', '_localhost', '_noheaders', '_nportprog', '_passwdprog', 
> '_portprog', '_queryprog', '_safe_map', '_safe_quoters', '_tagprog', 
> '_thishost', '_typeprog', '_urlopener', '_userprog', '_valueprog', 
> 'addbase', 'addclosehook', 'addinfo', 'addinfourl', 'always_safe', 
> 'basejoin', 'c', 'ftpcache', 'ftperrors', 'ftpwrapper', 'getproxies', 
> 'getproxies_environment', 'getproxies_macosx_sysconf', 'i', 'localhost', 
> 'main', 'noheaders', 'os', 'pathname2url', 'proxy_bypass', 
> 'proxy_bypass_environment', 'proxy_bypass_macosx_sysconf', 'quote', 
> 'quote_plus', 'reporthook', 'socket', 'splitattr', 'splithost', 
> 'splitnport', 'splitpasswd', 'splitport', 'splitquery', 'splittag', 
> 'splittype', 'splituser', 'splitvalue', 'ssl', 'string', 'sys', 'test', 
> 'test1', 'thishost', 'time', 'toBytes', 'unquote', 'unquote_plus', 
> 'unwrap', 'url2pathname', 'urlcleanup', 'urlencode', 'urlopen', 
> 'urlretrieve']
>
> >>> urllib.unquote('%D0%98%D0%B2%D0%B0%D0%BD')
> '\xd0\x98\xd0\xb2\xd0\xb0\xd0\xbd'
>
> >>> urllib.unquote('%D0%98%D0%B2%D0%B0%D0%BD').decode('UTF-8')
> u'\u0418\u0432\u0430\u043d'
>
> >>> print urllib.unquote('%D0%98%D0%B2%D0%B0%D0%BD').decode('UTF-8')
> Иван
>

I know, but python3.2 doesn't have urllib.unquote. Instead I should use 
urlllib.parse.unquote. This function has parameter encoding which has 
default value UTF-8. And pyhton 3 string doesn't have a method called 
decode. Therefore I can only use this encoding parameter, but it already 
have proper value. And I write the following (in my application function): 
        print("name=%D0%98%D0%B2%D0%B0%D0%BD")
        print(urllib.parse.unquote("name=%D0%98%D0%B2%D0%B0%D0%BD"))
The output when started from wsgiref:
        name=%D0%98%D0%B2%D0%B0%D0%BD
        name=Иван
And from mod_wsgi:
        name=%D0%98%D0%B2%D0%B0%D0%BD
        name=\xd0\x98\xd0\xb2\xd0\xb0\xd0\xbd
It seems that some environment variables may affect it but I haven't found 
any mention about it in 
https://docs.python.org/3.2/library/urllib.parse.html

 

>
> So technically your WSGI application doesn't conform to the WSGI 
> specification. That it wasn't was hidden by possible issues with the 
> wsgiref server implementation incorrectly passing through a byte string 
> rather than the % encode string. That or the HTTP client used was 
> incorrectly passing through UTF-8 rather than % encoding as technically 
> only your application can know what encoding it should be.
>

Unfortunately, I have the same problem if just hard code this string in my 
code (see above). So the problem isn't in passing the % encode string.

 

>
> End result, is that it is up to you to decode value with the correct 
> encoding.
>
> >>> dir(cgi)
> ['FieldStorage', 'FormContent', 'FormContentDict', 
> 'InterpFormContentDict', 'MiniFieldStorage', 'StringIO', 
> 'SvFormContentDict', 'UserDict', '__all__', '__builtins__', '__doc__', 
> '__file__', '__name__', '__package__', '__version__', '_parseparam', 
> 'attrgetter', 'catch_warnings', 'dolog', 'escape', 'filterwarnings', 
> 'initlog', 'log', 'logfile', 'logfp', 'maxlen', 'mimetools', 'nolog', 'os', 
> 'parse', 'parse_header', 'parse_multipart', 'parse_qs', 'parse_qsl', 
> 'print_arguments', 'print_directory', 'print_environ', 
> 'print_environ_usage', 'print_exception', 'print_form', 'rfc822', 'sys', 
> 'test', 'urllib', 'urlparse', 'valid_boundary', 'warn']
>
> >>> cgi.parse_qs('name=%D0%98%D0%B2%D0%B0%D0%BD')
> {'name': ['\xd0\x98\xd0\xb2\xd0\xb0\xd0\xbd']}
>
> >>> cgi.parse_qs('name=%D0%98%D0%B2%D0%B0%D0%BD')['name']
> ['\xd0\x98\xd0\xb2\xd0\xb0\xd0\xbd']
>
> >>> 
> cgi.parse_qs('name=%D0%98%D0%B2%D0%B0%D0%BD')['name'][0].decode('UTF-8')
> u'\u0418\u0432\u0430\u043d'
>
> >>> print 
> cgi.parse_qs('name=%D0%98%D0%B2%D0%B0%D0%BD')['name'][0].decode('UTF-8')
> Иван
>

https://docs.python.org/3.2/library/cgi.html tells me that I should use 
urllib.parse.parse_qs as soon as cgi.parse_qs is deprecated in Python 3.2 

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.

Re: [modwsgi] Why does parse_qs give different result depending on the way wsgi application has been started?

Reply via email to