On Saturday 12 of September 2009 19:48:52 Robert Bradshaw wrote: > On Sep 12, 2009, at 2:26 AM, Dag Sverre Seljebotn wrote: > > I'm with Stefan, a -3 flag which turns on > > > > from __future__ import division, unicode_literals, etc > > > > seems like the right mechanism. Changing semantics based on the > > Python version used to compile the C source can't be a good thing. > > We already do for the rest of the builtins. > > The Py2 str object is gone in Py3. Bytes do not support the % > operator (probably one of the most common operations on strings) and, > as pointed out, bytes(x) does not give the string representation of > x (str(5) -> "\0\0\0\0\0" is rather unsettling). Semantically, the > str type of Py2 is closer to the str type of Py3 than it is to the > bytes type of Py3, and is meant to be used in its place. The fact > that it's unicode rather than bytes under the hood is an > implementation detail that the user need not be bothered with only > when they are trying to get at the underlying char*.
I agree. In most of the places I used str and unprefixed literals in my original Py2/Pyrex based code, it simply means "I want text". Except for the few places where I actually need to convert to char*, all my code would still work fine with Py3's unicode str. bytes, on the other hand, seems to be a very bad replacement for str. I ran into both of the issues mentioned above (% operator and str(n)) when I tried my code with Py3. Also, 'foo'[0] equals 102, and even a simple print 'foo' doesn't work as expected (it prints b'foo'). I'm fairly new to Cython, so please excuse my ignorance, but even after reading many of the mails in the list archive about this topic, I still don't understand why the str -> bytes replacement is necessary. Why not just let 'unicode' always denote a unicode string, 'bytes' always a byte string, and let 'str' be 'str' in any Python version? Dominic _______________________________________________ Cython-dev mailing list [email protected] http://codespeak.net/mailman/listinfo/cython-dev
