Re: [Python-Dev] PEP 3333: wsgi_string() function

2011-01-10 Thread Ian Bicking
On Sun, Jan 9, 2011 at 1:47 AM, Stephen J. Turnbull step...@xemacs.orgwrote:

 Robert Brewer writes:

   Python 3.1 was released June 27th, 2009. We're coming up faster on the
   two-year period than we seem to be on a revised WSGI spec. Maybe we
   should shoot for a bytes of a known encoding type first.

 You have one.  It's called ISO 2022: Information processing -- ISO
 7-bit and 8-bit coded character sets -- Code extension techniques.
 The popularity of that standard speaks for itself.


The kind of object PJE was referring to is more like Ruby's strings, which
do not embed the encoding inside the bytes themselves but have the encoding
as a kind of annotation on the bytes, and do lazy transcoding when combining
strings of different encodings.  The goal with respect to WSGI is that you
could annotate bytes with an encoding but also change or fix that encoding
if other out-of-band information implied that you got the encoding wrong
(e.g., some data is submitted with the encoding of the page the browser was
on, and so nothing inside the request itself will indicate the encoding of
the data).  Latin1 is kind of the poor man's version of this -- it's a good
guess at an encoding, that at worst requires transcoding that can be done in
a predictable way.  (Personally I think Latin1 gets us 99% of the way there,
and so bytes-of-a-known-encoding are not really that important to the WSGI
case.)

  Ian
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3333: wsgi_string() function

2011-01-10 Thread Michael Foord

On 10/01/2011 17:24, Ian Bicking wrote:
On Sun, Jan 9, 2011 at 1:47 AM, Stephen J. Turnbull 
step...@xemacs.org mailto:step...@xemacs.org wrote:


Robert Brewer writes:

 Python 3.1 was released June 27th, 2009. We're coming up faster
on the
 two-year period than we seem to be on a revised WSGI spec. Maybe we
 should shoot for a bytes of a known encoding type first.

You have one.  It's called ISO 2022: Information processing -- ISO
7-bit and 8-bit coded character sets -- Code extension techniques.
The popularity of that standard speaks for itself.


The kind of object PJE was referring to is more like Ruby's strings, 
which do not embed the encoding inside the bytes themselves but have 
the encoding as a kind of annotation on the bytes, and do lazy 
transcoding when combining strings of different encodings.  The goal 
with respect to WSGI is that you could annotate bytes with an encoding 
but also change or fix that encoding if other out-of-band information 
implied that you got the encoding wrong (e.g., some data is submitted 
with the encoding of the page the browser was on, and so nothing 
inside the request itself will indicate the encoding of the data).  
Latin1 is kind of the poor man's version of this -- it's a good guess 
at an encoding, that at worst requires transcoding that can be done in 
a predictable way.  (Personally I think Latin1 gets us 99% of the way 
there, and so bytes-of-a-known-encoding are not really that important 
to the WSGI case.)




I think the language moratorium was not the only objection to the 
inclusion of a third string type in Python (the screwed string - safe 
to treat neither as bytes nor as text). I recall objections in principle 
too from core developers during the EuroPython language summit.


Michael


  Ian


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk



--
http://www.voidspace.org.uk/

May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing http://www.sqlite.org/different.html

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3333: wsgi_string() function

2011-01-10 Thread Nick Coghlan
On Tue, Jan 11, 2011 at 3:24 AM, Ian Bicking i...@colorstudy.com wrote:

 The kind of object PJE was referring to is more like Ruby's strings, which
 do not embed the encoding inside the bytes themselves but have the encoding
 as a kind of annotation on the bytes, and do lazy transcoding when combining
 strings of different encodings.  The goal with respect to WSGI is that you
 could annotate bytes with an encoding but also change or fix that encoding
 if other out-of-band information implied that you got the encoding wrong
 (e.g., some data is submitted with the encoding of the page the browser was
 on, and so nothing inside the request itself will indicate the encoding of
 the data).  Latin1 is kind of the poor man's version of this -- it's a good
 guess at an encoding, that at worst requires transcoding that can be done in
 a predictable way.  (Personally I think Latin1 gets us 99% of the way there,
 and so bytes-of-a-known-encoding are not really that important to the WSGI
 case.)

Having done the upgrade to urllib to support direct manipulation of
byte sequences, I don't think such a type would help as much people
hoped anyway. Converting to Unicode, manipulating as text and
converting back really *is* the right way to do text manipulation
(however, providing bytes-in-bytes-out APIs that do the conversions
for you can also be quite convenient).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3333: wsgi_string() function

2011-01-10 Thread Stephen J. Turnbull
Ian Bicking writes:
  On Sun, Jan 9, 2011 at 1:47 AM, Stephen J. Turnbull 
  step...@xemacs.orgwrote:
  
   Robert Brewer writes:
  
 Python 3.1 was released June 27th, 2009. We're coming up faster on the
 two-year period than we seem to be on a revised WSGI spec. Maybe we
 should shoot for a bytes of a known encoding type first.
  
   You have one.  It's called ISO 2022: Information processing -- ISO
   7-bit and 8-bit coded character sets -- Code extension techniques.
   The popularity of that standard speaks for itself.
  
  
  The kind of object PJE was referring to is more like Ruby's strings,

Notice that Ruby was written by a Japanese, the same culture that
brought us Mule, TRON, X Compound Text, and ISO-2022 in the first
place.  Matsumoto himself probably isn't infected with the Unicode is
going to be the death of all Japanese culture bug, but that's the
attitude that is behind ISO 2022.

  which do not embed the encoding inside the bytes themselves but have the 
  encoding
  as a kind of annotation on the bytes,

My pointis that ISO-2022 is basically just a serialization of that.

And it sucks; nobody uses it, except in Japanese and Korean email.
Maybe Mandarin (but Taiwan and Hong Kong use Big5 or EUC, not an
escape-extended representation).

  and do lazy transcoding when combining strings of different
  encodings.

Which buys WSGI nothing, AIUI, since the people who want this claim
that translating to Unicode either correctly or as big bytes (ie,
zero-extension) is inefficient.  They're shoveling bits; much of the
time, by the time the out-of-band information catches up, it's going
to be too late.

  The goal with respect to WSGI is that you could annotate bytes with
  an encoding but also change or fix that encoding if other
  out-of-band information implied that you got the encoding wrong
  (e.g., some data is submitted with the encoding of the page the
  browser was on, and so nothing inside the request itself will
  indicate the encoding of the data).

A noble goal, but nobody's gonna bell that cat.  This is all just
wishful thinking.  2 decades of experience with Emacs/Mule and similar
efforts show that if you provide this facility, people will use it,
and that use will include a lot of abuse (ie, throwing the garbage
into somebody else's backyard, rather than disposing of it yourself)
-- in the end, the garbage gets piled high enough that it's not worth
the effort to try to make it work.

  Latin1 is kind of the poor man's version of this -- it's a good
  guess at an encoding, that at worst requires transcoding that can
  be done in a predictable way.  (Personally I think Latin1 gets us
  99% of the way there, and so bytes-of-a-known-encoding are not
  really that important to the WSGI case.)

In particular, it gets PJE 100% of the way there, since he proposes
always targeting ISO 8859/1, anyway.

And if it's not useful to WSGI, who is it useful to?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3333: wsgi_string() function

2011-01-08 Thread Stephen J. Turnbull
Robert Brewer writes:

  Python 3.1 was released June 27th, 2009. We're coming up faster on the
  two-year period than we seem to be on a revised WSGI spec. Maybe we
  should shoot for a bytes of a known encoding type first.

You have one.  It's called ISO 2022: Information processing -- ISO
7-bit and 8-bit coded character sets -- Code extension techniques.
The popularity of that standard speaks for itself.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3333: wsgi_string() function

2011-01-07 Thread And Clover
On Tue, 2011-01-04 at 03:44 +0100, Victor Stinner wrote:
 What is this horrible encoding bytes-as-unicode?

It is a unicode string decoded from bytes using ISO-8859-1. ISO-8859-1
is the encoding specified by the HTTP RFC, as well as having the happy
property of preserving every input byte.

 os.environ is supposed to be correctly decoded and contain valid unicode 
 characters.

Nope. It is not possible to ‘correctly’ decode to unicode for os.environ
because that decoding happens long before the web application gets a
look in. Maybe the web application is using UTF-8, maybe it's using
cp1252, but if we let the server/gateway decide and do that decoding
before the application can do anything about it, we will get the wrong
encoding in *many* cases and the result will be permanent, unrecoverable
mangling of non-ASCII characters in submitted headers.

 If WSGI uses another encoding than the locale encoding (which is a bad idea),

It's an absolutely necessary idea. The locale encoding is nothing to do
with the web application's encoding. Windows applications need to be
able to use UTF-8 (which is never the ANSI code page), and web
applications in general need to be deployable to any server without
having to worry about the server's locale.

The locale-dependent status quo is that non-ASCII characters in URL
paths and other HTTP headers don't work for Python apps.

The recoding dances present in wsgiref's CGIHandler for 3.2 are
distasteful but completely necessary to normalise differences in
encodings used by various servers and platforms to generate their CGI
environment.

  it should use os.environb and decodes keys and values using its
 own encoding.

Well yes, but:

(a) os.environb doesn't exist in previous Python 3.1, making it
impossible to implement WSGI before 3.2;
(b) there are also non-HTTP-related environment variables, which may
contain native Unicode strings (eg, very commonly, Windows pathnames),
so you have to have both environ *and* environb.

The bytes-or-bytes-in-Unicode argument is something that has been
bounced around Web-SIG for literally *years*; this is what we ended up
with. Although I personally like bytes, frankly, a re-run of this
argument *again* whilst WSGI remains in perpetual stalemate does not
appeal. WSGI and wsgiref in Python 3.0-3.1 simply not work at all. This
has been an embarrassing situation for what is supposed to be a leading
web language. Let's not perpetuate this sorry story to 3.2 as well.

-- 
And Clover
mailto:a...@doxdesk.com http://www.doxdesk.com
skype:uknrbobince gtalk:chat?jid=bobi...@gmail.com


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3333: wsgi_string() function

2011-01-07 Thread Victor Stinner
Le jeudi 06 janvier 2011 à 23:50 +, And Clover a écrit :
 On Tue, 2011-01-04 at 03:44 +0100, Victor Stinner wrote:
  What is this horrible encoding bytes-as-unicode?
 
 It is a unicode string decoded from bytes using ISO-8859-1. ISO-8859-1
 is the encoding specified by the HTTP RFC, as well as having the happy
 property of preserving every input byte. PEP  requires it.

ISO-8859-1 for all fields: SERVER_NAME, PATH_INFO, the URL, form
data, ...?

  os.environ is supposed to be correctly decoded and contain valid
 unicode characters.
 
 It is not possible to ‘correctly’ decode to unicode for os.environ
 because that decoding happens long before the web application (the
 only party that knows what encoding should be in use) gets a look in.

Agreed.

 Maybe the web application is using UTF-8, maybe it's using cp1252,
 but if we let the server/gateway decide and do that decoding (...)
 It's an absolutely necessary idea. The locale encoding is nothing 
 to do with the web application's encoding. (...)

Ok, so you must pass byte strings to the server/gateway. If you pass
unicode, how do the server/gateway know that it has to redecode a value?
Should it redecode all values? Anything, it is stupid to use a temporary
useless pseudo-encoding (bytes-in-unicode).

 The recoding dances present in wsgiref's CGIHandler for 3.2 are
 distasteful but completely necessary to normalise differences in
 encodings used by various servers and platforms to generate their CGI
 environment.

I don't understand why read_environ() gives unicode values: as you
explained, the server/gateway will have to encode the values again, and
then finally to decode them from the correct encoding.

On POSIX, the current code looks like that:

 a) the OS pass a bytes environ to the program
 b) Python decodes environ from the locale encoding
 c) wsgi.read_environ() encodes environ to the locale encoding to get
back the original bytes environ: this step can be skipped if os.environb
is available
 d) wsgi.read_environ() decodes environ from ISO-8859-1
 e) the server/gateway encodes environ to ISO-8859-1
 f) the server/gateway decodes environ from the right encoding

Hey! Don't you think that there are useless encode/decode steps here?
Especially (d)-(e) is useless and introduces a confusion: the environ
contains other keys that don't come from os.environ and are already
correctly decoded, how do the the server/gateway know that they are
already correctly decoded?

I propose simply (for Python 3.2):

 a) the OS pass a bytes environ to the program: wsgi.read_environ() uses
it
 b) the server/gateway decodes environ from the right encoding

and...

 (a) os.environb doesn't exist in previous Python 3.1, making it
 impossible to implement WSGI before 3.2;

For Python 3.1, add a step between (a) and (b): encode environ to the
locale encoding (with surrogateescape) to get back the original bytes
environ.

 (b) a byte environment on Windows would have to be encoded
 from the Unicode environment, with a server-specific encoding,
 and then what encoding are you going to choose for the variables
 that contain non-HTTP-sourced native Unicode strings (such as,
 very commonly, Windows pathnames)?

The variables coming from the HTTP server should be encoded again to the
server-specific encoding. Other variables should be kept unchanged.

The server/gateway can simply test the type of the variable: if it's
uncode, nothing to do, if it's bytes: decode it from the correct
encoding.

 The bytes-or-bytes-in-Unicode argument is something that has been
 bounced around Web-SIG for literally *years*; (...) WSGI and wsgiref
 in Python 3.0-3.1 simply does not work.

I don't understand why you are attached to this horrible hack
(bytes-in-unicode). It introduces more work and more confusing than
using raw bytes unchanged.

It doesn't work and so something has to be changed.

Victor

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3333: wsgi_string() function

2011-01-07 Thread Stephen J. Turnbull
Victor Stinner writes:

  It doesn't work and so something has to be changed.

What specific bug have you observed?

Everybody hates this hack, or at the very least is somewhat
embarrassed by it, but the working group clearly believes that it
works and something like it is necessary.  They've studied it for
years.

To get rid of it, somebody needs to demonstrate a bug, and propose
something better, plus implement it in code, plus fix any tests that
expect Unicode and now get bytes, plus create any additional tests
that may be necessitated by changing from a Unicode representation to
a bytes representation.

I hate it too, but not enough to to ask anybody to do any of the above
without a real bug.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3333: wsgi_string() function

2011-01-07 Thread Nick Coghlan
On Fri, Jan 7, 2011 at 9:51 PM, Victor Stinner
victor.stin...@haypocalc.com wrote:
 On POSIX, the current code looks like that:

  a) the OS pass a bytes environ to the program
  b) Python decodes environ from the locale encoding
  c) wsgi.read_environ() encodes environ to the locale encoding to get
 back the original bytes environ: this step can be skipped if os.environb
 is available
  d) wsgi.read_environ() decodes environ from ISO-8859-1
  e) the server/gateway encodes environ to ISO-8859-1
  f) the server/gateway decodes environ from the right encoding

 Hey! Don't you think that there are useless encode/decode steps here?
 Especially (d)-(e) is useless and introduces a confusion: the environ
 contains other keys that don't come from os.environ and are already
 correctly decoded, how do the the server/gateway know that they are
 already correctly decoded?

Because WSGI is platform neutral. WSGI apps have no idea if they're
running on Windows or POSIX. The type used to communicate between the
WSGI engine and the WSGI must be either bytes *or* unicode, and either
choice causes problems depending on the underlying OS.

bytes-as-unicode is not a great choice, it is merely the least bad
choice of the available options.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3333: wsgi_string() function

2011-01-07 Thread James Y Knight
On Jan 7, 2011, at 6:51 AM, Victor Stinner wrote:
 I don't understand why you are attached to this horrible hack
 (bytes-in-unicode). It introduces more work and more confusing than
 using raw bytes unchanged.
 
 It doesn't work and so something has to be changed.

It's gross but it does work. This has been discussed ad-nausium on web-sig over 
a period of years.

I'd like to reiterate that it is only even a potential issue for the 
PATH_INFO/SCRIPT_NAME keys. Those two keys are required to have been urldecoded 
already, into byte-data in some encoding. For all the other keys (including the 
ones from os.environ), they are either *properly* decoded in 8859-1 or are just 
ascii (possibly still urlencoded, so the app needs to urldecode and decode into 
a string with the correct encoding).

James
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3333: wsgi_string() function

2011-01-07 Thread P.J. Eby

At 09:43 AM 1/7/2011 -0500, James Y Knight wrote:

On Jan 7, 2011, at 6:51 AM, Victor Stinner wrote:
 I don't understand why you are attached to this horrible hack
 (bytes-in-unicode). It introduces more work and more confusing than
 using raw bytes unchanged.

 It doesn't work and so something has to be changed.

It's gross but it does work. This has been discussed ad-nausium on 
web-sig over a period of years.


I'd like to reiterate that it is only even a potential issue for the 
PATH_INFO/SCRIPT_NAME keys. Those two keys are required to have been 
urldecoded already, into byte-data in some encoding. For all the 
other keys (including the ones from os.environ), they are either 
*properly* decoded in 8859-1 or are just ascii (possibly still 
urlencoded, so the app needs to urldecode and decode into a string 
with the correct encoding).


Right.  Also, it should be mentioned that none of this would be 
necessary if we could've gotten a bytes of a known encoding 
type.  If you look back to the last big Python-Dev discussion on 
bytes/unicode and stdlib API breakage, this was the holdup for 
getting a sane WSGI spec.


Since we couldn't change the language to fix the problem (due to the 
moratorium), we had to use this less-pleasant way of dealing with 
things, in order to get a final WSGI spec for Python 3.


(If anybody is wondering about the specifics of the language change 
that was needed, it'd be having a bytes with known encoding type, 
that when combined in any polymorphic operation with a unicode 
string, would result in bytes-with-encoding output, and would raise 
an error if the resulting value could not be encoded in the target 
encoding.  Then we would simply do all WSGI header operations with 
this type, using latin-1 as the target encoding.)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3333: wsgi_string() function

2011-01-07 Thread Robert Brewer
P.J. Eby wrote:
 At 09:43 AM 1/7/2011 -0500, James Y Knight wrote:
 On Jan 7, 2011, at 6:51 AM, Victor Stinner wrote:
   I don't understand why you are attached to this horrible hack
   (bytes-in-unicode). It introduces more work and more confusing
than
   using raw bytes unchanged.
  
   It doesn't work and so something has to be changed.
 
 It's gross but it does work. This has been discussed ad-nausium on
 web-sig over a period of years.
 
 I'd like to reiterate that it is only even a potential issue for the
 PATH_INFO/SCRIPT_NAME keys. Those two keys are required to have been
 urldecoded already, into byte-data in some encoding. For all the
 other keys (including the ones from os.environ), they are either
 *properly* decoded in 8859-1 or are just ascii (possibly still
 urlencoded, so the app needs to urldecode and decode into a string
 with the correct encoding).
 
 Right.  Also, it should be mentioned that none of this would be
 necessary if we could've gotten a bytes of a known encoding
 type.  If you look back to the last big Python-Dev discussion on
 bytes/unicode and stdlib API breakage, this was the holdup for
 getting a sane WSGI spec.
 
 Since we couldn't change the language to fix the problem (due to the
 moratorium), we had to use this less-pleasant way of dealing with
 things, in order to get a final WSGI spec for Python 3.
 
 (If anybody is wondering about the specifics of the language change
 that was needed, it'd be having a bytes with known encoding type,
 that when combined in any polymorphic operation with a unicode
 string, would result in bytes-with-encoding output, and would raise
 an error if the resulting value could not be encoded in the target
 encoding.  Then we would simply do all WSGI header operations with
 this type, using latin-1 as the target encoding.)

Still looking forward to the day when that moratorium is lifted. Anyone
have any idea when that will be?


Bob
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3333: wsgi_string() function

2011-01-07 Thread Paul Moore
On 7 January 2011 18:36, Robert Brewer fuman...@aminus.org wrote:
 Still looking forward to the day when that moratorium is lifted. Anyone
 have any idea when that will be?

See PEP 3003 (http://www.python.org/dev/peps/pep-3003/) - Python 3.3
is expected to be post-moratorium.

Paul.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3333: wsgi_string() function

2011-01-07 Thread Bill Janssen
P.J. Eby p...@telecommunity.com wrote:

 Right.  Also, it should be mentioned that none of this would be
 necessary if we could've gotten a bytes of a known encoding type.

Indeed!  Or even string using a known encoding...

 If you look back to the last big Python-Dev discussion on
 bytes/unicode and stdlib API breakage, this was the holdup for getting
 a sane WSGI spec.

Yep.

Bill
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3333: wsgi_string() function

2011-01-07 Thread Robert Brewer
Paul Moore wrote:
 Robert Brewer fuman...@aminus.org wrote:
  P.J. Eby wrote:
   Also, it should be mentioned that none of this would be
   necessary if we could've gotten a bytes of a known encoding
   type.
 
  Still looking forward to the day when that moratorium is lifted.
  Anyone have any idea when that will be?
 
 See PEP 3003 (http://www.python.org/dev/peps/pep-3003/) - Python 3.3
 is expected to be post-moratorium.

This PEP proposes a temporary moratorium (suspension) of all changes to
the Python language syntax, semantics, and built-ins for a period of at
least two years from the release of Python 3.1.

Python 3.1 was released June 27th, 2009. We're coming up faster on the
two-year period than we seem to be on a revised WSGI spec. Maybe we
should shoot for a bytes of a known encoding type first.


Robert Brewer
fuman...@aminus.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3333: wsgi_string() function

2011-01-07 Thread Nick Coghlan
On Sat, Jan 8, 2011 at 6:16 AM, Robert Brewer fuman...@aminus.org wrote:
 Python 3.1 was released June 27th, 2009. We're coming up faster on the
 two-year period than we seem to be on a revised WSGI spec. Maybe we
 should shoot for a bytes of a known encoding type first.

There were a few minor* practical issues in getting agreement on how
such a type would actually behave. Instead, the approach WSGI adopted
(or the stricter, 7-bit ASCII only approach used internally by
urllib.parse to handle bytes in 3.2) was deemed sufficient, since it
could be done right now without having to agree on how many different
bikesheds were needed and what colours they should all be.

Cheers,
Nick.

*i.e. major :)

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3333: wsgi_string() function

2011-01-06 Thread And Clover
On Tue, 2011-01-04 at 03:44 +0100, Victor Stinner wrote:
 What is this horrible encoding bytes-as-unicode?

It is a unicode string decoded from bytes using ISO-8859-1. ISO-8859-1
is the encoding specified by the HTTP RFC, as well as having the happy
property of preserving every input byte. PEP  requires it.

 os.environ is supposed to be correctly decoded and contain valid
unicode characters.

It is not possible to ‘correctly’ decode to unicode for os.environ
because that decoding happens long before the web application (the
only party that knows what encoding should be in use) gets a look in.

Maybe the web application is using UTF-8, maybe it's using cp1252,
but if we let the server/gateway decide and do that decoding
before the application can do anything about it, we will get the wrong
encoding in *many* cases and the result will be permanent, unrecoverable
mangling of non-ASCII characters in submitted headers.

 If WSGI uses another encoding than the locale encoding (which is a bad
idea),

It's an absolutely necessary idea. The locale encoding is nothing to do
with the web application's encoding. Windows applications need to be
able to use UTF-8 (which is never the ANSI code page), and web
applications in general need to be deployable to any server without
having to worry about the server's locale.

The locale-dependent status quo is that non-ASCII characters in URL
paths and other HTTP headers don't work for Python apps.

The recoding dances present in wsgiref's CGIHandler for 3.2 are
distasteful but completely necessary to normalise differences in
encodings used by various servers and platforms to generate their CGI
environment.

  it should use os.environb and decodes keys and values using its
 own encoding.

Well yes, but:

(a) os.environb doesn't exist in previous Python 3.1, making it
impossible to implement WSGI before 3.2;
(b) a byte environment on Windows would have to be encoded
from the Unicode environment, with a server-specific encoding,
and then what encoding are you going to choose for the variables
that contain non-HTTP-sourced native Unicode strings (such as,
very commonly, Windows pathnames)?

The bytes-or-bytes-in-Unicode argument is something that has been
bounced around Web-SIG for literally *years*; this is what we ended up
with. Although I personally like bytes, frankly, a re-run of this
argument *again* whilst WSGI remains in perpetual stalemate does not
appeal. WSGI and wsgiref in Python 3.0-3.1 simply does not work. This
has long been an embarrassing situation for what is supposed to be a
leading
web language. Let us not perpetuate this sorry story to 3.2 as well.

-- 
And Clover
mailto:a...@doxdesk.com http://www.doxdesk.com
skype:uknrbobince gtalk:chat?jid=bobi...@gmail.com


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3333: wsgi_string() function

2011-01-06 Thread Raymond Hettinger
Can you please take a look at
http://docs.python.org/dev/whatsnew/3.2.html#pep--python-web-server-gateway-interface-v1-0-1
to see if it accurately recaps the resolution of the WSGI text/bytes issues.
I would appreciate any feedback, as it is likely that the whatsnew
document will be most people's first chance to hear the outcome
of the multi-year discussion.

Thanks,


Raymond


On Jan 6, 2011, at 3:50 PM, And Clover wrote:

 On Tue, 2011-01-04 at 03:44 +0100, Victor Stinner wrote:
 What is this horrible encoding bytes-as-unicode?
 
 It is a unicode string decoded from bytes using ISO-8859-1. ISO-8859-1
 is the encoding specified by the HTTP RFC, as well as having the happy
 property of preserving every input byte. PEP  requires it.
 
 os.environ is supposed to be correctly decoded and contain valid
 unicode characters.
 
 It is not possible to ‘correctly’ decode to unicode for os.environ
 because that decoding happens long before the web application (the
 only party that knows what encoding should be in use) gets a look in.
 
 Maybe the web application is using UTF-8, maybe it's using cp1252,
 but if we let the server/gateway decide and do that decoding
 before the application can do anything about it, we will get the wrong
 encoding in *many* cases and the result will be permanent, unrecoverable
 mangling of non-ASCII characters in submitted headers.
 
 If WSGI uses another encoding than the locale encoding (which is a bad
 idea),
 
 It's an absolutely necessary idea. The locale encoding is nothing to do
 with the web application's encoding. Windows applications need to be
 able to use UTF-8 (which is never the ANSI code page), and web
 applications in general need to be deployable to any server without
 having to worry about the server's locale.
 
 The locale-dependent status quo is that non-ASCII characters in URL
 paths and other HTTP headers don't work for Python apps.
 
 The recoding dances present in wsgiref's CGIHandler for 3.2 are
 distasteful but completely necessary to normalise differences in
 encodings used by various servers and platforms to generate their CGI
 environment.
 
 it should use os.environb and decodes keys and values using its
 own encoding.
 
 Well yes, but:
 
 (a) os.environb doesn't exist in previous Python 3.1, making it
 impossible to implement WSGI before 3.2;
 (b) a byte environment on Windows would have to be encoded
 from the Unicode environment, with a server-specific encoding,
 and then what encoding are you going to choose for the variables
 that contain non-HTTP-sourced native Unicode strings (such as,
 very commonly, Windows pathnames)?
 
 The bytes-or-bytes-in-Unicode argument is something that has been
 bounced around Web-SIG for literally *years*; this is what we ended up
 with. Although I personally like bytes, frankly, a re-run of this
 argument *again* whilst WSGI remains in perpetual stalemate does not
 appeal. WSGI and wsgiref in Python 3.0-3.1 simply does not work. This
 has long been an embarrassing situation for what is supposed to be a
 leading
 web language. Let us not perpetuate this sorry story to 3.2 as well.
 
 -- 
 And Clover
 mailto:a...@doxdesk.com http://www.doxdesk.com
 skype:uknrbobince gtalk:chat?jid=bobi...@gmail.com
 
 
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/raymond.hettinger%40gmail.com

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3333: wsgi_string() function

2011-01-06 Thread Glenn Linderman

On 1/6/2011 3:50 PM, And Clover wrote:

ISO-8859-1 is the encoding specified by the HTTP RFC


Please could I have the reference to that specification?  I only recall 
ASCII and UTF-8 in my readings of various things HTTP and HTML, for 
headers, and form data.  Naturally data pages can have any encoding they 
please, as there are headers and meta tags to describe their encodings.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3333: wsgi_string() function

2011-01-06 Thread James Y Knight

On Jan 6, 2011, at 8:16 PM, Glenn Linderman wrote:

 On 1/6/2011 3:50 PM, And Clover wrote:
 
 ISO-8859-1 is the encoding specified by the HTTP RFC
 
 Please could I have the reference to that specification?  I only recall ASCII 
 and UTF-8 in my readings of various things HTTP and HTML, for headers, and 
 form data.  Naturally data pages can have any encoding they please, as there 
 are headers and meta tags to describe their encodings.


Did you try google? http://www.google.com/search?http+rfc

James
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3333: wsgi_string() function

2011-01-06 Thread Stephen J. Turnbull
Glenn Linderman writes:
  On 1/6/2011 3:50 PM, And Clover wrote:
   ISO-8859-1 is the encoding specified by the HTTP RFC
  
  Please could I have the reference to that specification?

RFC 2616 (probably obsolete by now, but IRC ISO 8859/1 is already
there IIRC), and I don't think UTF-8 is the default for anything until
you get to XHTML (and maybe HTML5).

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3333: wsgi_string() function

2011-01-06 Thread Glenn Linderman

On 1/6/2011 7:37 PM, Stephen J. Turnbull wrote:

Glenn Linderman writes:
On 1/6/2011 3:50 PM, And Clover wrote:
  ISO-8859-1 is the encoding specified by the HTTP RFC
  
Please could I have the reference to that specification?

RFC 2616 (probably obsolete by now, but IRC ISO 8859/1 is already
there IIRC), and I don't think UTF-8 is the default for anything until
you get to XHTML (and maybe HTML5).


Thanks.

Looking back, it is 2068 and 1945 also, I just had a mental blind spot, 
thinking I understood the header formats from email-land, where they are 
more required to be ASCII, as mentioned in my reply to James.


UTF-8 is the default for FORM DATA when using multipart/form-data 
encoding, using the POST method.  Otherwise, it FORM DATA is limited to 
ASCII.  Per  http://www.w3.org/TR/html401/interact/forms.html#h-17.13.1 
which is HTML 4.01 (and maybe earlier, but I didn't go back further).


Nice to quote chapter and verse (or link) when declaring that something 
is in a standard.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3333: wsgi_string() function

2011-01-06 Thread P.J. Eby

At 04:00 PM 1/6/2011 -0800, Raymond Hettinger wrote:

Can you please take a look at
http://docs.python.org/dev/whatsnew/3.2.html#pep--python-web-server-gateway-interface-v1-0-1http://docs.python.org/dev/whatsnew/3.2.html#pep--python-web-server-gateway-interface-v1-0-1
to see if it accurately recaps the resolution of the WSGI text/bytes issues.
I would appreciate any feedback, as it is likely that the whatsnew
document will be most people's first chance to hear the outcome
of the multi-year discussion.


Hi Raymond -- nice work there.  A few minor suggestions:

1. Native strings are used as the keys and values of the environ 
dictionary, not just as headers for start_response.


2. The read_environ() method is strictly for use with CGI-to-WSGI 
gateways, or for bridging other CGI-like protocols (e.g. FastCGI) to 
WSGI.  It is ONLY for server implementers, in other words, and the 
typical app developer is doing something terribly wrong if they are 
even bothering to read its documentation.  ;-)


3. The primary relevance of the native string type to an app 
developer is that when porting code from Python 2 to 3, they must 
still decode environment variable values, even though they are 
already Unicode.  If their code was previously dealing only in 
Python 2 'str' objects, then nothing really changes.  If they were 
previously decoding from environ str's to unicode, then they must 
replace their prior .decode('whatever') with 
.encode('latin1').decode('whatever').  That's basically it for 
porting from Python 2.


IOW, this design choice allows most HTTP header manipulating code 
(whether input or output) to be ported to Python 3 with a very 
mechanical change pattern.  Most such code is working with ASCII 
anyway, since normally both input and output headers are, and there 
are few headers that an application would be likely to convert to 
actual unicode anyway.


On output via send_response(), if an application is currently 
encoding an output header  -- why they would be, I have no idea, but 
if they are -- they need to add a re-encode to latin1.  (i.e., 
.encode('whatever').decode('latin1'))


IOW, a short 2-to-3 porting guide for WSGI:

* If you just used strings for headers before, that part of your code 
doesn't change.  (And if it was broken before, it's still broken in 
exactly the same way.  No new breakage is introduced. ;-) )


* If you encoded any output headers or decoded any input headers, you 
must take into account the extra latin1 step.  This is expected to be 
rare, since it's usually only SCRIPT_NAME and PATH_INFO that anybody 
would ever care about on input, and almost never anything on output.


* Values yielded by an application or sent via a write() call MUST be 
byte strings; The environ and start_response() MUST be native 
strings.  No mixing and matching.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3333: wsgi_string() function

2011-01-04 Thread Antoine Pitrou
On Tue, 04 Jan 2011 03:44:53 +0100
Victor Stinner victor.stin...@haypocalc.com wrote:
 def wsgi_string(u):
 # Convert an environment variable to a WSGI bytes-as-unicode
 string
 return u.encode(enc, esc).decode('iso-8859-1')
 
 def run_with_cgi(application):
 environ = {k: wsgi_string(v) for k,v in os.environ.items()}
 environ['wsgi.input']= sys.stdin
 environ['wsgi.errors']   = sys.stderr
 environ['wsgi.version']  = (1, 0)
 ...
 --
 
 What is this horrible encoding bytes-as-unicode? os.environ is
 supposed to be correctly decoded and contain valid unicode characters.
 If WSGI uses another encoding than the locale encoding (which is a bad
 idea), it should use os.environb and decodes keys and values using its
 own encoding.
 
 If you really want to store bytes in unicode, str is not the right type:
 use the bytes type and use os.environb instead.

+1. We should minimize such reencoding dances, and avoid promoting them.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3333: wsgi_string() function

2011-01-04 Thread Victor Stinner
Le mardi 04 janvier 2011 à 13:20 +0100, Antoine Pitrou a écrit :
 On Tue, 04 Jan 2011 03:44:53 +0100
 Victor Stinner victor.stin...@haypocalc.com wrote:
  def wsgi_string(u):
  # Convert an environment variable to a WSGI bytes-as-unicode
  string
  return u.encode(enc, esc).decode('iso-8859-1')
  
  def run_with_cgi(application):
  environ = {k: wsgi_string(v) for k,v in os.environ.items()}
  environ['wsgi.input']= sys.stdin
  environ['wsgi.errors']   = sys.stderr
  environ['wsgi.version']  = (1, 0)
  ...
  --
  
  What is this horrible encoding bytes-as-unicode? os.environ is
  supposed to be correctly decoded and contain valid unicode characters.
  If WSGI uses another encoding than the locale encoding (which is a bad
  idea), it should use os.environb and decodes keys and values using its
  own encoding.
  
  If you really want to store bytes in unicode, str is not the right type:
  use the bytes type and use os.environb instead.
 
 +1. We should minimize such reencoding dances, and avoid promoting them.

The example from the PEP is specific to CGI and is a little bit special.

The reference implementation (wsgiref in py3k) only redecodes
(transcode) some variables:
---
_is_request = {
'SCRIPT_NAME', 'PATH_INFO', 'QUERY_STRING', 'REQUEST_METHOD',
'AUTH_TYPE',
'CONTENT_TYPE', 'CONTENT_LENGTH', 'HTTPS', 'REMOTE_USER',
'REMOTE_IDENT',
}.__contains__

def _needs_transcode(k):
return _is_request(k) or k.startswith('HTTP_') or
k.startswith('SSL_') \
or (k.startswith('REDIRECT_') and _needs_transcode(k[9:]))
---

My problem is that I don't understand how I can know if a variable was
converted to bytes-as-unicode or not. GrahamDumpleton told me on IRC,
that the framework is supposed to redecodes one more time some variables
(eg. PATH_INFO). But this is not explicit in the PEP and
_needs_transcode() is a private function.

Since the environ already contain different types (eg. wsgi.version is a
tuple, wsgi.multithread is a boolean, ...), why not keeping these
variables as raw bytes?

Victor

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3333: wsgi_string() function

2011-01-04 Thread Antoine Pitrou
On Tue, 04 Jan 2011 14:33:37 +0100
Victor Stinner victor.stin...@haypocalc.com wrote:
 Le mardi 04 janvier 2011 à 13:20 +0100, Antoine Pitrou a écrit :
  On Tue, 04 Jan 2011 03:44:53 +0100
  Victor Stinner victor.stin...@haypocalc.com wrote:
   def wsgi_string(u):
   # Convert an environment variable to a WSGI bytes-as-unicode
   string
   return u.encode(enc, esc).decode('iso-8859-1')
   
   def run_with_cgi(application):
   environ = {k: wsgi_string(v) for k,v in os.environ.items()}
   environ['wsgi.input']= sys.stdin
   environ['wsgi.errors']   = sys.stderr
   environ['wsgi.version']  = (1, 0)
   ...
   --
   
   What is this horrible encoding bytes-as-unicode? os.environ is
   supposed to be correctly decoded and contain valid unicode characters.
   If WSGI uses another encoding than the locale encoding (which is a bad
   idea), it should use os.environb and decodes keys and values using its
   own encoding.
   
   If you really want to store bytes in unicode, str is not the right type:
   use the bytes type and use os.environb instead.
  
  +1. We should minimize such reencoding dances, and avoid promoting them.
 
 The example from the PEP is specific to CGI and is a little bit special.

Well, it would be better if it used os.environb anyway ;)

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3333: wsgi_string() function

2011-01-04 Thread Tres Seaver
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 01/03/2011 09:44 PM, Victor Stinner wrote:
 Hi,
 
 In the PEP , I read:
 --
 import os, sys
 
 enc, esc = sys.getfilesystemencoding(), 'surrogateescape'
 
 def wsgi_string(u):
 # Convert an environment variable to a WSGI bytes-as-unicode
 string
 return u.encode(enc, esc).decode('iso-8859-1')
 
 def run_with_cgi(application):
 environ = {k: wsgi_string(v) for k,v in os.environ.items()}
 environ['wsgi.input']= sys.stdin
 environ['wsgi.errors']   = sys.stderr
 environ['wsgi.version']  = (1, 0)
 ...
 --
 
 What is this horrible encoding bytes-as-unicode? os.environ is
 supposed to be correctly decoded and contain valid unicode characters.
 If WSGI uses another encoding than the locale encoding (which is a bad
 idea), it should use os.environb and decodes keys and values using its
 own encoding.
 
 If you really want to store bytes in unicode, str is not the right type:
 use the bytes type and use os.environb instead.

I'm not clear on the semantics here, but I'm pretty sure you'll find
that the web-SIG does know them well.  I have CC'ed that list (via gmane).

Note that Guido just recently wrote on that list that he considers that
PEP to be de facto accepted.


Tres.
- -- 
===
Tres Seaver  +1 540-429-0999  tsea...@palladion.com
Palladion Software   Excellence by Designhttp://palladion.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk0jSTUACgkQ+gerLs4ltQ4cCQCgyc9QsRfzC2lrtnDO0v8TvK6W
rVwAnjvvwD47J1chgupqM3unt5c2jd6p
=8LEf
-END PGP SIGNATURE-

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3333: wsgi_string() function

2011-01-04 Thread P.J. Eby

At 03:44 AM 1/4/2011 +0100, Victor Stinner wrote:

Hi,

In the PEP , I read:
--
import os, sys

enc, esc = sys.getfilesystemencoding(), 'surrogateescape'

def wsgi_string(u):
# Convert an environment variable to a WSGI bytes-as-unicode
string
return u.encode(enc, esc).decode('iso-8859-1')

def run_with_cgi(application):
environ = {k: wsgi_string(v) for k,v in os.environ.items()}
environ['wsgi.input']= sys.stdin
environ['wsgi.errors']   = sys.stderr
environ['wsgi.version']  = (1, 0)
...
--

What is this horrible encoding bytes-as-unicode? os.environ is
supposed to be correctly decoded and contain valid unicode characters.
If WSGI uses another encoding than the locale encoding (which is a bad
idea), it should use os.environb and decodes keys and values using its
own encoding.

If you really want to store bytes in unicode, str is not the right type:
use the bytes type and use os.environb instead.


If you want to discuss this, the Web-SIG is the appropriate 
place.  Also, it was the appropriate place months ago, when the final 
decision on the environ encoding was made.  ;-)


IOW, the above change to the PEP is merely fixing the code example to 
be correct for Python 3, where it previously was correct only for 
Python 2.  The PEP itself has already required this since the 
previous revisions, and wsgiref in the stdlib is already compliant 
with the above (although it uses a more sophisticated approach for 
dealing with win32 compatibility).


The rationale for this choice is described in the PEP, and was also 
discussed in the mailing list emails back when the work was being done.


IOW, this particular ship already sailed a long time ago.  In fact, 
for Jython this bytes-as-unicode approach has been the PEP 
333-defined encoding for at least *six years*...  so it's REALLY late 
to complain about it now! ;-)


PEP  is merely a mapping of PEP 333 to allow WSGI apps to be 
ported from Python 2 to Python 3.  There is work in progress on the 
Web-SIG now on PEP 444, which will support only Python 2.6+, where 
'b' literals and the 'bytes' alias are available.  It is as yet 
uncertain what environ encoding will be used, but at the moment I'm 
not convinced that either pure bytes or pure unicode are acceptable 
replacements for the PEP 333-compatible approach.


In any event, that is a discussion for the Web-SIG, not Python-Dev.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3333: wsgi_string() function

2011-01-04 Thread Guido van Rossum
On Tue, Jan 4, 2011 at 8:22 AM, Tres Seaver tsea...@palladion.com wrote:
 Note that Guido just recently wrote on that list that he considers that
 PEP to be de facto accepted.

That was conditional on there not being any objections in the next 24
hours. There have been plenty, so I'm retracting that.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] PEP 3333: wsgi_string() function

2011-01-03 Thread Victor Stinner
Hi,

In the PEP , I read:
--
import os, sys

enc, esc = sys.getfilesystemencoding(), 'surrogateescape'

def wsgi_string(u):
# Convert an environment variable to a WSGI bytes-as-unicode
string
return u.encode(enc, esc).decode('iso-8859-1')

def run_with_cgi(application):
environ = {k: wsgi_string(v) for k,v in os.environ.items()}
environ['wsgi.input']= sys.stdin
environ['wsgi.errors']   = sys.stderr
environ['wsgi.version']  = (1, 0)
...
--

What is this horrible encoding bytes-as-unicode? os.environ is
supposed to be correctly decoded and contain valid unicode characters.
If WSGI uses another encoding than the locale encoding (which is a bad
idea), it should use os.environb and decodes keys and values using its
own encoding.

If you really want to store bytes in unicode, str is not the right type:
use the bytes type and use os.environb instead.

Victor

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com