Hi Graham:

Graham Dumpleton schrieb am Donnerstag, 18. August 2022 um 10:58:23 UTC+2:

> [snip] 
>
Now it wasn't practical in Python 3 to pass through variables as byte 
> strings as the range of operations you could do on byte strings was very 
> limited. Thus the rule for WSGI under Python 3 was that the WSGI server was 
> required to take the underlying byte stream and convert it to the unicode 
> capable default string as ISO-8859-1 (Latin-1). It was then up to the WSGI 
> application to convert that to another string with the correct encoding. 
> Since it was a unicode string at that point, to do that it would need to do.
>

 Ah!  That fully explains the effect I observe!  As Apache passes the 
environment as UTF-8, actually *every* value in it is double-encoded, right?

    value.encode('ISO-8859-1').decode('UTF-8')
>

Not sure if it makes any difference in practice, but IMHO

    value.encode('raw_unicode_escape').decode('utf-8')

*might* be more appropriate.  At least it works fine with mixed input from 
different code pages (I added greek, cyrillic and hiragana chars to the 
latin1 one).

Thanks again for your help!

Best, Albrecht.

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/modwsgi/c46d8d94-c59c-4dec-832a-8859a49764a0n%40googlegroups.com.

Reply via email to