At 11:47 PM 6/20/2010 +0200, Antoine Pitrou wrote:
On Sun, 20 Jun 2010 14:40:56 -0400
"P.J. Eby" <p...@telecommunity.com> wrote:
>
> Actually, I would say that it's more that (in the network protocol
> case) we *have* bytes, some of which we would like to *treat* as
> text, yet do not wish to constantly convert back and forth to
> full-blown unicode

Well, then why don't you just stick with a bytes object?

Because the stdlib is not consistent in how well it handles bytes objects.


> While reading over this thread, I'm wondering whether at least my
> (WSGI-related) problems in this area would be solved by the
> availability of a type (say "bstr") that was simply a wrapper
> providing string-like behavior over an underlying bytes, byte array,
> or memoryview, that would produce objects of compatible type when
> combined with strings (by encoding them to match).

This really sounds horrible. Python 3 was designed precisely to
discourage ad hoc mixing of bytes and unicode.

Who said ad hoc mixing? The point is to have a simple way to ensure that my bytes don't get implicitly converted to unicode, and (ideally) don't have to get converted *back*, either.

The idea that by passing bytes to the stdlib, I randomly get back either bytes or unicode (i.e. undocumentedly and inconsistently between different stdlib APIs, as well as possibly dependent on runtime conditions), is NOT "discouraging ad hoc mixing".


> seems so much saner than writing *this* everywhere:
>
>       newurl = str(urljoin(str(base, 'latin-1'), 'subdir'), 'latin-1')

urljoin already returns an str object. Why do you want to decode it
again?

Ugh.  I meant:

   newurl = urljoin(str(base, 'latin-1'), 'subdir').encode('latin-1')

Which just goes to the point of how ridiculous it is to have to convert things to strings and back again to use APIs that ought to just handle bytes properly in the first place.

(I don't know if there are actually any problems in the case of urljoin; I wasn't the person who originally brought up the "stdlib not treating URLs as bytestrings in 3.x" issue on the Web-SIG. Somewhere along the line I got the impression that urljoin was one such API, but in researching the issue it looks like maybe the canonical example was qsl_parse.)

It's possible that the stdlib situation has improved tremendously since then, of course. I don't know if the bug was reported, or how many remain.

And it's precisely the part where I don't know how many remain that keeps me from doing more than idly thinking about porting any of my libraries (let alone apps) to Python 3.x. The fact that the stdlib itself has these sorts of issues raises major red flags to me about whether the One Obvious Way has yet been found. If the stdlib maintainers don't agree on the One Obvious Way, that seems even worse. Or if there is such a Way, but nobody has documented its practices yet, that's almost the same thing.

I also find it weird that there seem to be two camps on this subject, one of which claims that All Is Well And There Is No Problem -- but I do not recall seeing anyone who was in the "What do I do; this doesn't seem ready" camp who switched sides and took the time to write down what made them realize that they were wrong about there being a problem, and what steps they had to take. The existence of one or more such documents would certainly ease my mind, and I imagine that of other people who are less waiting for others' libraries, than for the stdlib (and/or language) itself to settle.

(Or more precisely, for it to be SEEN to have settled.)

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to