Re: [Python-Dev] bytes / unicode

Michael Foord Tue, 22 Jun 2010 16:22:02 -0700

On 22/06/2010 19:07, James Y Knight wrote:

On Jun 22, 2010, at 1:03 PM, Ian Bicking wrote:
Similarly I'd expect (from experience) that a programmer using Pythonto want to take the same approach, sticking with unencoded data innearly all situations.
Yeah. This is a real issue I have with the direction Python3 went: itpushes you into decoding everything to unicode early,

Well, both .NET and Java take this approach as well. I wonder how theycope with the particular issues that have been mentioned for webapplications - both platforms are used extensively for web apps.

Having used IronPython, which has .NET unicode strings (although it doesa lot of magic to *allow* you to store binary data in strings forcompatibility with CPython), I have to say that this approach makes alot of programming *so* much more pleasant.

We did a lot of I/O (can you do useful programming without I/O?)including working with databases, but I didn't work *much* with wireprotocols (fetching a fair bit of data from the web though now I thinkabout it). I think wire protocols can present particular problems;sometimes having mixed encodings in the same data it seems. Where youdon't have these problems keeping bytes data and all Unicode text dataseparate and encoding / decoding at the boundaries is really much moresane and pleasant.

It would be a real shame if we decided that the way forward for Python 3was to try and move closer to how bytes/text was handled in Python 2.


All the best,

Michael

even when you don't care -- all you really wanted to do is pass itfrom one API to another, with some well-defined transformations, whichdon't actually depend on it having being decoded properly. (Forexample, extracting the path from the URL and attempting to open it asa file on the filesystem.)
This means that Python3 programs can become *more* fragile in the faceof random data you encounter out in the real world, rather than lessfragile, which was the goal of the whole exercise.
The surrogateescape method is a nice workaround for this, but I can'thelp thinking that it might've been better to just treat stuff aspossibly-invalid-but-probably-utf8 byte-strings from input, throughprocessing, to output. It seems kinda too late for that, though: nexttime someone designs a language, they can try that. :)
James


_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk



--
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog

READ CAREFULLY. By accepting and reading this email you agree, on behalf of your 
employer, to release me from all obligations and waivers arising from any and all 
NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, 
confidentiality, non-disclosure, non-compete and acceptable use policies ("BOGUS 
AGREEMENTS") that I have entered into with your employer, its partners, licensors, 
agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. 
You further represent that you have the authority to release me from any BOGUS AGREEMENTS 
on behalf of your employer.

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes / unicode

Reply via email to