And I of course meant to say 'why you should not do this yourself'.

On 07/08/2014, at 5:16 PM, Graham Dumpleton <[email protected]> wrote:

> 
> On 07/08/2014, at 5:03 PM, Dmitriy Chugunov <[email protected]> wrote:
> 
>> 
>> 
>> четверг, 7 августа 2014 г., 10:11:17 UTC+4 пользователь Graham Dumpleton 
>> написал:
>> I realised later what your problem is.
>> 
>> Technically the wsgiref server is likely broken when it comes to generating 
>> and passing through CGI param values.
>> 
>> For QUERY_STRING, when sent by a browser, it is supposed to be sent % 
>> encoded as the value is meant to be ASCII only.
>> 
>> QUERY_STRING contains only ASCII symbols: name=%D0%98%D0%B2%D0%B0%D0%BD
>>  
>> 
>> That ASCII percent encoded value, because the encoding can only be known by 
>> the application, is supposed to get all the way through to the application. 
>> You are then meant to deal with it like:
>> 
>> >>> dir(urllib)
>> ['ContentTooShortError', 'FancyURLopener', 'MAXFTPCACHE', 'URLopener', 
>> '__all__', '__builtins__', '__doc__', '__file__', '__name__', '__package__', 
>> '__version__', '_ftperrors', '_get_proxies', '_get_proxy_settings', 
>> '_have_ssl', '_hexdig', '_hextochr', '_hostprog', '_is_unicode', 
>> '_localhost', '_noheaders', '_nportprog', '_passwdprog', '_portprog', 
>> '_queryprog', '_safe_map', '_safe_quoters', '_tagprog', '_thishost', 
>> '_typeprog', '_urlopener', '_userprog', '_valueprog', 'addbase', 
>> 'addclosehook', 'addinfo', 'addinfourl', 'always_safe', 'basejoin', 'c', 
>> 'ftpcache', 'ftperrors', 'ftpwrapper', 'getproxies', 
>> 'getproxies_environment', 'getproxies_macosx_sysconf', 'i', 'localhost', 
>> 'main', 'noheaders', 'os', 'pathname2url', 'proxy_bypass', 
>> 'proxy_bypass_environment', 'proxy_bypass_macosx_sysconf', 'quote', 
>> 'quote_plus', 'reporthook', 'socket', 'splitattr', 'splithost', 
>> 'splitnport', 'splitpasswd', 'splitport', 'splitquery', 'splittag', 
>> 'splittype', 'splituser', 'splitvalue', 'ssl', 'string', 'sys', 'test', 
>> 'test1', 'thishost', 'time', 'toBytes', 'unquote', 'unquote_plus', 'unwrap', 
>> 'url2pathname', 'urlcleanup', 'urlencode', 'urlopen', 'urlretrieve']
>> 
>> >>> urllib.unquote('%D0%98%D0%B2%D0%B0%D0%BD')
>> '\xd0\x98\xd0\xb2\xd0\xb0\xd0\xbd'
>> 
>> >>> urllib.unquote('%D0%98%D0%B2%D0%B0%D0%BD').decode('UTF-8')
>> u'\u0418\u0432\u0430\u043d'
>> 
>> >>> print urllib.unquote('%D0%98%D0%B2%D0%B0%D0%BD').decode('UTF-8')
>> Иван
>> 
>> I know, but python3.2 doesn't have urllib.unquote. Instead I should use 
>> urlllib.parse.unquote. This function has parameter encoding which has 
>> default value UTF-8. And pyhton 3 string doesn't have a method called 
>> decode. Therefore I can only use this encoding parameter, but it already 
>> have proper value. And I write the following (in my application function): 
>>         print("name=%D0%98%D0%B2%D0%B0%D0%BD")
>>         print(urllib.parse.unquote("name=%D0%98%D0%B2%D0%B0%D0%BD"))
>> The output when started from wsgiref:
>>         name=%D0%98%D0%B2%D0%B0%D0%BD
>>         name=Иван
>> And from mod_wsgi:
>>         name=%D0%98%D0%B2%D0%B0%D0%BD
>>         name=\xd0\x98\xd0\xb2\xd0\xb0\xd0\xbd
>> It seems that some environment variables may affect it but I haven't found 
>> any mention about it in https://docs.python.org/3.2/library/urllib.parse.html
> 
> In Python 3 it is horrible as you need to do a dance as the although it comes 
> through as a Unicode string, it is supposed to be as the byte value string as 
> Latin-1. This is meant you are supposed to convert it back to a byte string 
> as Latin-1 and then back to Unicode as UTF-8.
> 
> s = urllib.parse.unquote('%D0%98%D0%B2%D0%B0%D0%BD', 
> encoding='Latin-1').encode('Latin-1')
> >>> s
> b'\xd0\x98\xd0\xb2\xd0\xb0\xd0\xbd'
> >>> s.decode('UTF-8')
> 'Иван'
> 
> This is a particularly horrible area for WSGI under Python 3. It is why 
> people always recommend you should do this yourself and should use at least a 
> micro framework such as Flask as it hides all these crappy conversions.
> 
> What happens if you do that dance?
> 
> Graham
> 
> 
>>  
>> 
>> So technically your WSGI application doesn't conform to the WSGI 
>> specification. That it wasn't was hidden by possible issues with the wsgiref 
>> server implementation incorrectly passing through a byte string rather than 
>> the % encode string. That or the HTTP client used was incorrectly passing 
>> through UTF-8 rather than % encoding as technically only your application 
>> can know what encoding it should be.
>> 
>> Unfortunately, I have the same problem if just hard code this string in my 
>> code (see above). So the problem isn't in passing the % encode string.
>> 
>>  
>> 
>> End result, is that it is up to you to decode value with the correct 
>> encoding.
>> 
>> >>> dir(cgi)
>> ['FieldStorage', 'FormContent', 'FormContentDict', 'InterpFormContentDict', 
>> 'MiniFieldStorage', 'StringIO', 'SvFormContentDict', 'UserDict', '__all__', 
>> '__builtins__', '__doc__', '__file__', '__name__', '__package__', 
>> '__version__', '_parseparam', 'attrgetter', 'catch_warnings', 'dolog', 
>> 'escape', 'filterwarnings', 'initlog', 'log', 'logfile', 'logfp', 'maxlen', 
>> 'mimetools', 'nolog', 'os', 'parse', 'parse_header', 'parse_multipart', 
>> 'parse_qs', 'parse_qsl', 'print_arguments', 'print_directory', 
>> 'print_environ', 'print_environ_usage', 'print_exception', 'print_form', 
>> 'rfc822', 'sys', 'test', 'urllib', 'urlparse', 'valid_boundary', 'warn']
>> 
>> >>> cgi.parse_qs('name=%D0%98%D0%B2%D0%B0%D0%BD')
>> {'name': ['\xd0\x98\xd0\xb2\xd0\xb0\xd0\xbd']}
>> 
>> >>> cgi.parse_qs('name=%D0%98%D0%B2%D0%B0%D0%BD')['name']
>> ['\xd0\x98\xd0\xb2\xd0\xb0\xd0\xbd']
>> 
>> >>> cgi.parse_qs('name=%D0%98%D0%B2%D0%B0%D0%BD')['name'][0].decode('UTF-8')
>> u'\u0418\u0432\u0430\u043d'
>> 
>> >>> print 
>> >>> cgi.parse_qs('name=%D0%98%D0%B2%D0%B0%D0%BD')['name'][0].decode('UTF-8')
>> Иван
>> 
>> https://docs.python.org/3.2/library/cgi.html tells me that I should use 
>> urllib.parse.parse_qs as soon as cgi.parse_qs is deprecated in Python 3.2 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "modwsgi" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at http://groups.google.com/group/modwsgi.
>> For more options, visit https://groups.google.com/d/optout.
> 

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.

Reply via email to