Re: [Python-Dev] bytes / unicode

P.J. Eby Mon, 21 Jun 2010 09:48:57 -0700

At 10:51 PM 6/21/2010 +1000, Nick Coghlan wrote:

It may be that there are places where we need to rewrite standard
library algorithms to be bytes/str neutral (e.g. by using length one
slices instead of indexing). It may be that there are more APIs that
need to grow "encoding" keyword arguments that they then pass on to
the functions they call or use to convert str arguments to bytes (or
vice-versa). But without people trying to port affected libraries and
reporting bugs when they find issues, the situation isn't going to
improve.


Now, if these bugs are already being reported against 3.1 and just
aren't getting fixed, that's a completely different story...

The overall impression, though, is that this isn't really a stepforward. Now, bytes are the special case instead of unicode, butthat special case isn't actually handled any better by the stdlib -in fact, it's arguably worse. And, the burden of addressing thisseems to have been shifted from the people who made the change, tothe people who are going to use it. But those people are notnecessarily in a position to tell you anything more than, "give mesomething that works with bytes".

What I can tell you is that before, since string constants in thestdlib were ascii bytes, and transparently promoted to unicode,stdlib behavior was *predictable* in the presence of special cases:you got back either bytes or unicode, but either way, you couldidempotently upgrade the result to unicode, or just pass it on. APIswere "str safe, unicode aware". If you passed in bytes, you weren'tgoing to get unicode without a warning, and if you passed in unicode,it'd work and you'd get unicode back.

Now, the APIs are neither safe nor aware -- if you pass bytes in, youget unpredictable results back.

Ironically, it almost *would* have been better if bytes simply didn'twork as strings at all, *ever*, but if you could wrap them with abstr() to *treat* them as text. You could still have restrictions oncombining them, as long as it was a restriction on the unicode youmixed with them. That is, if you could combine a bstr and a str ifthe *str* was restricted to ASCII.

If we had the Python 3 design discussions to do over again, I think Iwould now have stuck with the position of not letting bytes bestring-compatible at all, and instead proposed an explicit bstr()wrapper/adapter to use them as strings, that would (in that case)force coercion in the direction of bytes rather than strings. (Andbstr need not have been a builtin - it could have been something youimport, to help discourage casual usage.)

Might this approach lead to some people doing things wrong in thecase of porting? Sure. But there'd be little reason to use it innew code that didn't have a real need for bytestring manipulation.

It might've been a better balance between practicality and purity, inthat it keeps the language pure, while offering a practical way todeal with things in bytes if you really need to. And, bytes wouldn'tsilently succeed *some* of the time, leading to a trap. An easyinconsistency is worse than a bit of uniform chicken-waving.

Is it too late to make that tradeoff? Probably. Certainly it's notpractical to *implement* outside the language core, and removingstring methods would fux0r anybody whose currently-ported code relieson bytes objects having string-like methods.


_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes / unicode

Reply via email to