[Python-Dev] bytes / unicode

Antoine Pitrou Sun, 20 Jun 2010 14:49:21 -0700

On Sun, 20 Jun 2010 14:40:56 -0400
"P.J. Eby" <p...@telecommunity.com> wrote:
> 
> Actually, I would say that it's more that (in the network protocol 
> case) we *have* bytes, some of which we would like to *treat* as 
> text, yet do not wish to constantly convert back and forth to 
> full-blown unicode


Well, then why don't you just stick with a bytes object?

> While reading over this thread, I'm wondering whether at least my 
> (WSGI-related) problems in this area would be solved by the 
> availability of a type (say "bstr") that was simply a wrapper 
> providing string-like behavior over an underlying bytes, byte array, 
> or memoryview, that would produce objects of compatible type when 
> combined with strings (by encoding them to match).

This really sounds horrible. Python 3 was designed precisely to
discourage ad hoc mixing of bytes and unicode.

> Actually, if the Python 3 str() constructor could do O(1) conversion 
> for the latin-1 case (i.e., just wrapped the underlying bytes), I 
> would just put, "bstr = lambda x: str(x,'latin-1')" at the top of my 
> programs and have roughly the same effect.

Did you do any measurements that show that latin-1 decoding (hardly a
complicated task) introduces a performance regression in Web frameworks
in 3.x?

> seems so much saner than writing *this* everywhere:
> 
>       newurl = str(urljoin(str(base, 'latin-1'), 'subdir'), 'latin-1')

urljoin already returns an str object. Why do you want to decode it
again?


_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] bytes / unicode

Reply via email to