On Sep 12, 2009, at 2:26 AM, Dag Sverre Seljebotn wrote:
> Robert Bradshaw wrote:
>> How to handle strings/unicode, especially in Python 3, has been a
>> huge area of debate on the list. However, I'm surprised that str is
>> mapped to bytes in Python 3. What was the justification for this, or
>> is it just a bug? I think if
>>
>> def foo():
>> return str, isinstance("abc", str)
>>
>> have different behavior in Cython and Python that there's a bug
>> (unless there's a *very* good reason to do so). I'm not trying to re-
>> advocate automatic char* <-> unicode conversions.
>
> Are you sure? What about this:
>
> def foo():
> return 4 / 5
>
> Should this have the same behaviour in Cython and Python regardless of
> Python version as well?
I'm talking about Python object literals. (<object>4) / 5 will
already have Py3 semantics no matter what we do. I'm arguing that
<object>"literal" should be the native "str" type.
> I'm with Stefan, a -3 flag which turns on
>
> from __future__ import division, unicode_literals, etc
>
> seems like the right mechanism. Changing semantics based on the Python
> version used to compile the C source can't be a good thing.
We already do for the rest of the builtins.
The Py2 str object is gone in Py3. Bytes do not support the %
operator (probably one of the most common operations on strings) and,
as pointed out, bytes(x) does not give the string representation of x
(str(5) -> "\0\0\0\0\0" is rather unsettling). Semantically, the str
type of Py2 is closer to the str type of Py3 than it is to the bytes
type of Py3, and is meant to be used in its place. The fact that it's
unicode rather than bytes under the hood is an implementation detail
that the user need not be bothered with *only* when they are trying
to get at the underlying char*.
- Robert
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev