Re: [Python-Dev] bytes / unicode

P.J. Eby Sun, 20 Jun 2010 19:35:33 -0700

At 11:47 PM 6/20/2010 +0200, Antoine Pitrou wrote:

On Sun, 20 Jun 2010 14:40:56 -0400
"P.J. Eby" <p...@telecommunity.com> wrote:
>
> Actually, I would say that it's more that (in the network protocol
> case) we *have* bytes, some of which we would like to *treat* as
> text, yet do not wish to constantly convert back and forth to
> full-blown unicode


Well, then why don't you just stick with a bytes object?


Because the stdlib is not consistent in how well it handles bytes objects.

> While reading over this thread, I'm wondering whether at least my
> (WSGI-related) problems in this area would be solved by the
> availability of a type (say "bstr") that was simply a wrapper
> providing string-like behavior over an underlying bytes, byte array,
> or memoryview, that would produce objects of compatible type when
> combined with strings (by encoding them to match).

This really sounds horrible. Python 3 was designed precisely to
discourage ad hoc mixing of bytes and unicode.

Who said ad hoc mixing? The point is to have a simple way to ensurethat my bytes don't get implicitly converted to unicode, and(ideally) don't have to get converted *back*, either.

The idea that by passing bytes to the stdlib, I randomly get backeither bytes or unicode (i.e. undocumentedly and inconsistentlybetween different stdlib APIs, as well as possibly dependent onruntime conditions), is NOT "discouraging ad hoc mixing".

> seems so much saner than writing *this* everywhere:
>
>       newurl = str(urljoin(str(base, 'latin-1'), 'subdir'), 'latin-1')

urljoin already returns an str object. Why do you want to decode it
again?


Ugh.  I meant:

   newurl = urljoin(str(base, 'latin-1'), 'subdir').encode('latin-1')

Which just goes to the point of how ridiculous it is to have toconvert things to strings and back again to use APIs that ought tojust handle bytes properly in the first place.

(I don't know if there are actually any problems in the case ofurljoin; I wasn't the person who originally brought up the "stdlibnot treating URLs as bytestrings in 3.x" issue on theWeb-SIG. Somewhere along the line I got the impression that urljoinwas one such API, but in researching the issue it looks like maybethe canonical example was qsl_parse.)

It's possible that the stdlib situation has improved tremendouslysince then, of course. I don't know if the bug was reported, or howmany remain.

And it's precisely the part where I don't know how many remain thatkeeps me from doing more than idly thinking about porting any of mylibraries (let alone apps) to Python 3.x. The fact that the stdlibitself has these sorts of issues raises major red flags to me aboutwhether the One Obvious Way has yet been found. If the stdlibmaintainers don't agree on the One Obvious Way, that seems evenworse. Or if there is such a Way, but nobody has documented itspractices yet, that's almost the same thing.

I also find it weird that there seem to be two camps on this subject,one of which claims that All Is Well And There Is No Problem -- but Ido not recall seeing anyone who was in the "What do I do; thisdoesn't seem ready" camp who switched sides and took the time towrite down what made them realize that they were wrong about therebeing a problem, and what steps they had to take. The existence ofone or more such documents would certainly ease my mind, and Iimagine that of other people who are less waiting for others'libraries, than for the stdlib (and/or language) itself to settle.


(Or more precisely, for it to be SEEN to have settled.)

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes / unicode

Reply via email to