On 22/06/2010 19:07, James Y Knight wrote:
On Jun 22, 2010, at 1:03 PM, Ian Bicking wrote:
Similarly I'd expect (from experience) that a programmer using Python
to want to take the same approach, sticking with unencoded data in
nearly all situations.
Yeah. This is a real issue I have with the direction Python3 went: it
pushes you into decoding everything to unicode early,
Well, both .NET and Java take this approach as well. I wonder how they
cope with the particular issues that have been mentioned for web
applications - both platforms are used extensively for web apps.
Having used IronPython, which has .NET unicode strings (although it does
a lot of magic to *allow* you to store binary data in strings for
compatibility with CPython), I have to say that this approach makes a
lot of programming *so* much more pleasant.
We did a lot of I/O (can you do useful programming without I/O?)
including working with databases, but I didn't work *much* with wire
protocols (fetching a fair bit of data from the web though now I think
about it). I think wire protocols can present particular problems;
sometimes having mixed encodings in the same data it seems. Where you
don't have these problems keeping bytes data and all Unicode text data
separate and encoding / decoding at the boundaries is really much more
sane and pleasant.
It would be a real shame if we decided that the way forward for Python 3
was to try and move closer to how bytes/text was handled in Python 2.
All the best,
Michael
even when you don't care -- all you really wanted to do is pass it
from one API to another, with some well-defined transformations, which
don't actually depend on it having being decoded properly. (For
example, extracting the path from the URL and attempting to open it as
a file on the filesystem.)
This means that Python3 programs can become *more* fragile in the face
of random data you encounter out in the real world, rather than less
fragile, which was the goal of the whole exercise.
The surrogateescape method is a nice workaround for this, but I can't
help thinking that it might've been better to just treat stuff as
possibly-invalid-but-probably-utf8 byte-strings from input, through
processing, to output. It seems kinda too late for that, though: next
time someone designs a language, they can try that. :)
James
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk
--
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog
READ CAREFULLY. By accepting and reading this email you agree, on behalf of your
employer, to release me from all obligations and waivers arising from any and all
NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap,
confidentiality, non-disclosure, non-compete and acceptable use policies ("BOGUS
AGREEMENTS") that I have entered into with your employer, its partners, licensors,
agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges.
You further represent that you have the authority to release me from any BOGUS AGREEMENTS
on behalf of your employer.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com