Stefan Behnel, 25.04.2010 10:30: > Stefan Behnel, 25.04.2010 07:28: >> the question is just if we want >> >> py_int_val = uval >> py_ustr_val =<unicode>uval >> >> or >> >> py_int_val =<int>uval >> py_ustr_val = uval >> >> My gut feeling is that the coercion to strings would be more straight >> forward. It would also clean up the compiler code a bit as implicit >> coercions (e.g. for comparisons) would then work out-of-the-box in both >> ways. Currently, "Py_UNICODE in unicode" must be special cased (which it >> still would in the future, but only for optimisation purposes, not to make >> it work at all). > > Given that only char->bytes breaks backwards compatibility, since > Py_UNICODE wasn't supported at all until now - could we agree on making > Py_UNICODE->unicode the default behaviour and leaving char->PyInt as is for > the time being? We can still decide to break backwards compatibility later > on, and we can always support the explicit > > py_s =<bytes>char_val > py_i =<int>pyunicode_val > > safely, so users can just rely on that if they need safe behaviour. > > The 'reasoning' would be that a plain 'char' is a bit too generic to map it > to a Python bytes string (note that 'char*' isn't), whereas Py_UNICODE is > something that only really makes sense in the context of a Python unicode > string, so it's a lot less surprising if it also maps to a Python unicode > string.
... and this even matches the behaviour of the 'bytes' type in Python 3 which really returns integer values on indexing/iteration, whereas the 'str' (i.e. 'unicode') type returns substrings. I've implemented the explicit (<bytes>char)->bytes and default Py_UNICODE->unicode coercions and updated the documentation chapter on string handling accordingly. http://hg.cython.org/cython-docs/rev/e712d9647f47 Stefan _______________________________________________ Cython-dev mailing list [email protected] http://codespeak.net/mailman/listinfo/cython-dev
