Ian Bicking writes: > On Sun, Jan 9, 2011 at 1:47 AM, Stephen J. Turnbull > <step...@xemacs.org>wrote: > > > Robert Brewer writes: > > > > > Python 3.1 was released June 27th, 2009. We're coming up faster on the > > > two-year period than we seem to be on a revised WSGI spec. Maybe we > > > should shoot for a "bytes of a known encoding" type first. > > > > You have one. It's called "ISO 2022: Information processing -- ISO > > 7-bit and 8-bit coded character sets -- Code extension techniques". > > The popularity of that standard speaks for itself. > > > > The kind of object PJE was referring to is more like Ruby's strings,
Notice that Ruby was written by a Japanese, the same culture that brought us Mule, TRON, X Compound Text, and ISO-2022 in the first place. Matsumoto himself probably isn't infected with the "Unicode is going to be the death of all Japanese culture" bug, but that's the attitude that is behind ISO 2022. > which do not embed the encoding inside the bytes themselves but have the > encoding > as a kind of annotation on the bytes, My pointis that ISO-2022 is basically just a serialization of that. And it sucks; nobody uses it, except in Japanese and Korean email. Maybe Mandarin (but Taiwan and Hong Kong use Big5 or EUC, not an escape-extended representation). > and do lazy transcoding when combining strings of different > encodings. Which buys WSGI nothing, AIUI, since the people who want this claim that translating to Unicode either correctly or as "big bytes" (ie, zero-extension) is inefficient. They're shoveling bits; much of the time, by the time the out-of-band information catches up, it's going to be too late. > The goal with respect to WSGI is that you could annotate bytes with > an encoding but also change or fix that encoding if other > out-of-band information implied that you got the encoding wrong > (e.g., some data is submitted with the encoding of the page the > browser was on, and so nothing inside the request itself will > indicate the encoding of the data). A noble goal, but nobody's gonna bell that cat. This is all just wishful thinking. 2 decades of experience with Emacs/Mule and similar efforts show that if you provide this facility, people will use it, and that use will include a lot of abuse (ie, throwing the garbage into somebody else's backyard, rather than disposing of it yourself) -- in the end, the garbage gets piled high enough that it's not worth the effort to try to make it work. > Latin1 is kind of the poor man's version of this -- it's a good > guess at an encoding, that at worst requires transcoding that can > be done in a predictable way. (Personally I think Latin1 gets us > 99% of the way there, and so bytes-of-a-known-encoding are not > really that important to the WSGI case.) In particular, it gets PJE 100% of the way there, since he proposes always targeting ISO 8859/1, anyway. And if it's not useful to WSGI, who is it useful to? _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com