Graham Dumpleton wrote: > 2009/4/2 Robert Brewer <fuman...@aminus.org>: > > Alan Kennedy wrote: > >> Hi Graham, > >> > >> I think yours is a good solution to the problem. > >> > >> [Graham] > >> > In other words, leave all the existing CGI variables to come > through > >> > as latin-1 decode > >> > >> As latin-1 or rfc-2047 decoded, to unicode. > >> > >> > and do anything new in 'wsgi' variable namespace, > >> > >> So the server provides > >> > >> "wsgi.server_decoded_SCRIPT_NAME" == u"whatever" > >> "wsgi.server_decoded_PATH_INFO" == u"whatever" > >> "wsgi.server_decode_charset" == u"utf-8" > > > > I think everyone at the sprint today acquiesced to having > > SCRIPT_NAME/PATH_INFO/QUERY_STRING be set in the environ as unicode. > The > > server can decide (probably subject to configuration). I've > implemented > > this in the python3 branch of CherryPy and it seems to work > brilliantly. > > Assuming the server *is* configurable, deployers should be able to > > choose Latin-1 if they need to recover the original bytes, without > > having to support a separate set of encoded-byte entries. > > Seems to me that you can't have it be configurable and it must always > be latin-1 interpretation. The problem is where you are composing > multiple WSGI applications. If they each have different expectations > or requirements as to how it is handled, aren't you going to have a > problem. Or am I missing something in the way you are explaining it?
I would not expect multiple middlewares to want to decode the same URI differently. But I would assume you'd run into problems when multiple URI's in the same site had different encodings. Mark Ramm gave the use case of exposing Unix filenames-as-bytes in URL's--the encoding is unknown but a human may know better. Allowing/forcing the human to stick that information in the app or in the server is the same work, IMO. A server could be configurable to the point of using different encodings for different URI's via regex matching or <Location> sections or some other means. I'd be happy with a spec that said, "servers MUST always decode these 3 entries, but SHOULD allow the encoding used to be configurable." I'd be equally happy with a spec that said, "servers MUST always decode these 3 as Latin-1" and explain why. Both have their manageable pros and cons. But delaying the decoding to the app by setting those 3 entries as bytes has more cons than pros. Robert Brewer fuman...@aminus.org _______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com