On 22/06/2010 19:07, James Y Knight wrote:

On Jun 22, 2010, at 1:03 PM, Ian Bicking wrote:
Similarly I'd expect (from experience) that a programmer using Python to want to take the same approach, sticking with unencoded data in nearly all situations.

Yeah. This is a real issue I have with the direction Python3 went: it pushes you into decoding everything to unicode early,

Well, both .NET and Java take this approach as well. I wonder how they cope with the particular issues that have been mentioned for web applications - both platforms are used extensively for web apps.

Having used IronPython, which has .NET unicode strings (although it does a lot of magic to *allow* you to store binary data in strings for compatibility with CPython), I have to say that this approach makes a lot of programming *so* much more pleasant.

We did a lot of I/O (can you do useful programming without I/O?) including working with databases, but I didn't work *much* with wire protocols (fetching a fair bit of data from the web though now I think about it). I think wire protocols can present particular problems; sometimes having mixed encodings in the same data it seems. Where you don't have these problems keeping bytes data and all Unicode text data separate and encoding / decoding at the boundaries is really much more sane and pleasant.

It would be a real shame if we decided that the way forward for Python 3 was to try and move closer to how bytes/text was handled in Python 2.

All the best,

Michael

even when you don't care -- all you really wanted to do is pass it from one API to another, with some well-defined transformations, which don't actually depend on it having being decoded properly. (For example, extracting the path from the URL and attempting to open it as a file on the filesystem.)

This means that Python3 programs can become *more* fragile in the face of random data you encounter out in the real world, rather than less fragile, which was the goal of the whole exercise.

The surrogateescape method is a nice workaround for this, but I can't help thinking that it might've been better to just treat stuff as possibly-invalid-but-probably-utf8 byte-strings from input, through processing, to output. It seems kinda too late for that, though: next time someone designs a language, they can try that. :)

James


_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk


--
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog

READ CAREFULLY. By accepting and reading this email you agree, on behalf of your 
employer, to release me from all obligations and waivers arising from any and all 
NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, 
confidentiality, non-disclosure, non-compete and acceptable use policies ("BOGUS 
AGREEMENTS") that I have entered into with your employer, its partners, licensors, 
agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. 
You further represent that you have the authority to release me from any BOGUS AGREEMENTS 
on behalf of your employer.


_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to