On Jun 24, 2010, at 4:59 PM, Guido van Rossum wrote:

> Regarding the proposal of a String ABC, I hope this isn't going to
> become a backdoor to reintroduce the Python 2 madness of allowing
> equivalency between text and bytes for *some* strings of bytes and not
> others.

For my part, what I want out of a string ABC is simply the ability to do 
application-specific optimizations.

There are many applications where all input and output is text, but _must_ be 
UTF-8.  Even GTK uses UTF-8 as its native text representation, so "output" 
could just be display.

Right now, in Python 3, the only way to be "correct" about this is to copy 
every byte of input into 4 bytes of output, then copy each code point *back* 
into a single byte of output.  If all your application does is rewrite the 
occasional XML attribute, for example, this cost can be significant, if not 
overwhelming.

I'd like a version of 'decode' which would give me a type that was, in every 
respect, unicode, and responded to all protocols exactly as other unicode 
objects (or "str objects", if you prefer py3 nomenclature ;-)) do, but wouldn't 
actually copy any of that memory unless it really needed to (for example, to 
pass to a C API that expected native wide characters), and that would hold on 
to the original bytes so that it could produce them on demand if encoded to the 
same encoding again. So, as others in this thread have mentioned, the 'ABC' 
really implies some stuff about C APIs as well.

I'm not sure about the exact performance impact of such a class, which is why 
I'd like the ability to implement it *outside* of the stdlib and see how it 
works on a project, and return with a proposal along with some data.  There are 
also different ways to implement this, and other optimizations (like ropes) 
which might be better.

You can almost do this today, but the lack of things like the hypothetical 
"__rcontains__" does make it impossible to be totally transparent about it.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to