I realised later what your problem is.
Technically the wsgiref server is likely broken when it comes to generating and
passing through CGI param values.
For QUERY_STRING, when sent by a browser, it is supposed to be sent % encoded
as the value is meant to be ASCII only.
That ASCII percent encoded value, because the encoding can only be known by the
application, is supposed to get all the way through to the application. You are
then meant to deal with it like:
>>> dir(urllib)
['ContentTooShortError', 'FancyURLopener', 'MAXFTPCACHE', 'URLopener',
'__all__', '__builtins__', '__doc__', '__file__', '__name__', '__package__',
'__version__', '_ftperrors', '_get_proxies', '_get_proxy_settings',
'_have_ssl', '_hexdig', '_hextochr', '_hostprog', '_is_unicode', '_localhost',
'_noheaders', '_nportprog', '_passwdprog', '_portprog', '_queryprog',
'_safe_map', '_safe_quoters', '_tagprog', '_thishost', '_typeprog',
'_urlopener', '_userprog', '_valueprog', 'addbase', 'addclosehook', 'addinfo',
'addinfourl', 'always_safe', 'basejoin', 'c', 'ftpcache', 'ftperrors',
'ftpwrapper', 'getproxies', 'getproxies_environment',
'getproxies_macosx_sysconf', 'i', 'localhost', 'main', 'noheaders', 'os',
'pathname2url', 'proxy_bypass', 'proxy_bypass_environment',
'proxy_bypass_macosx_sysconf', 'quote', 'quote_plus', 'reporthook', 'socket',
'splitattr', 'splithost', 'splitnport', 'splitpasswd', 'splitport',
'splitquery', 'splittag', 'splittype', 'splituser', 'splitvalue', 'ssl',
'string', 'sys', 'test', 'test1', 'thishost', 'time', 'toBytes', 'unquote',
'unquote_plus', 'unwrap', 'url2pathname', 'urlcleanup', 'urlencode', 'urlopen',
'urlretrieve']
>>> urllib.unquote('%D0%98%D0%B2%D0%B0%D0%BD')
'\xd0\x98\xd0\xb2\xd0\xb0\xd0\xbd'
>>> urllib.unquote('%D0%98%D0%B2%D0%B0%D0%BD').decode('UTF-8')
u'\u0418\u0432\u0430\u043d'
>>> print urllib.unquote('%D0%98%D0%B2%D0%B0%D0%BD').decode('UTF-8')
Иван
So technically your WSGI application doesn't conform to the WSGI specification.
That it wasn't was hidden by possible issues with the wsgiref server
implementation incorrectly passing through a byte string rather than the %
encode string. That or the HTTP client used was incorrectly passing through
UTF-8 rather than % encoding as technically only your application can know what
encoding it should be.
End result, is that it is up to you to decode value with the correct encoding.
>>> dir(cgi)
['FieldStorage', 'FormContent', 'FormContentDict', 'InterpFormContentDict',
'MiniFieldStorage', 'StringIO', 'SvFormContentDict', 'UserDict', '__all__',
'__builtins__', '__doc__', '__file__', '__name__', '__package__',
'__version__', '_parseparam', 'attrgetter', 'catch_warnings', 'dolog',
'escape', 'filterwarnings', 'initlog', 'log', 'logfile', 'logfp', 'maxlen',
'mimetools', 'nolog', 'os', 'parse', 'parse_header', 'parse_multipart',
'parse_qs', 'parse_qsl', 'print_arguments', 'print_directory', 'print_environ',
'print_environ_usage', 'print_exception', 'print_form', 'rfc822', 'sys',
'test', 'urllib', 'urlparse', 'valid_boundary', 'warn']
>>> cgi.parse_qs('name=%D0%98%D0%B2%D0%B0%D0%BD')
{'name': ['\xd0\x98\xd0\xb2\xd0\xb0\xd0\xbd']}
>>> cgi.parse_qs('name=%D0%98%D0%B2%D0%B0%D0%BD')['name']
['\xd0\x98\xd0\xb2\xd0\xb0\xd0\xbd']
>>> cgi.parse_qs('name=%D0%98%D0%B2%D0%B0%D0%BD')['name'][0].decode('UTF-8')
u'\u0418\u0432\u0430\u043d'
>>> print
>>> cgi.parse_qs('name=%D0%98%D0%B2%D0%B0%D0%BD')['name'][0].decode('UTF-8')
Иван
Graham
On 07/08/2014, at 3:15 PM, Dmitriy Chugunov <[email protected]> wrote:
> I use Python 3.2.5 and Oracle Linux 6.4. I have written my wsgi application
> but I have some trouble: function urllib.parse.parse_qs behave differently
> depending on the way I started my application (Apache with mod_wsgi 4.2.6 or
> wsgiref.simple_server). In my application function I have the following code:
> def application(environ, start_response):
> print(environ["QUERY_STRING"])
> requestParams = parse_qs(environ["QUERY_STRING"])
> print(requestParams)
> .......
> When I start my program using wsgiref.simple_server and make a query
> /query?name=Иван (it's Russian name) I get the following output:
>> name=%D0%98%D0%B2%D0%B0%D0%BD
>> {'name':['Иван']}
> But my application with Apache + mod_wsgi 4.2.6 gives me the following:
>> name=%D0%98%D0%B2%D0%B0%D0%BD
>> {'name':['\xd0\x98\xd0\xb2\xd0\xb0\xd0\xbd']}
> As you can see, the latter doesn't give me correct Russian word encoded in
> UTF-8 although the input to the function is the same. According to
> https://docs.python.org/3.2/library/urllib.parse.html function parse_qs has
> default parameter encoding='utf-8'. As a result I have other problems during
> further work. I can't understand why this function works differently. But it
> seems that mod_wsgi affects on this behavior (as soon as when I do sudo -u
> apache python3.2 my_project.py everything works well).
>
> I have the following Apache virtual host:
>> <VirtualHost *:80>
>> DocumentRoot /var/www/my_project
>> <Directory />
>> Options FollowSymLinks
>> AllowOverride None
>> </Directory>
>> <Directory /var/www/my_project/>
>> Options Indexes FollowSymLinks MultiViews
>> AllowOverride None
>> Order allow,deny
>> allow from all
>> </Directory>
>> WSGIDaemonProcess my_project processes=8 threads=1
>> python-path=/var/www/my_project display-name=%{GROUP}
>> WSGIProcessGroup my_project
>> WSGIScriptAlias /my_project /var/www/my_project/my_project.py
>> </VirtualHost>
>>
> My apache uses prefork MPM.
>
> I also tried to use lang=en_US.UTF-8 locale=en_US.UTF-8 in WSGIDaemonProcess
> directive, but it didn't help.
>
> --
> You received this message because you are subscribed to the Google Groups
> "modwsgi" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/modwsgi.
> For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.