Stefan Behnel, 25.04.2010 10:30:
> Stefan Behnel, 25.04.2010 07:28:
>> the question is just if we want
>>
>>        py_int_val = uval
>>        py_ustr_val =<unicode>uval
>>
>> or
>>
>>        py_int_val =<int>uval
>>        py_ustr_val = uval
>>
>> My gut feeling is that the coercion to strings would be more straight
>> forward. It would also clean up the compiler code a bit as implicit
>> coercions (e.g. for comparisons) would then work out-of-the-box in both
>> ways. Currently, "Py_UNICODE in unicode" must be special cased (which it
>> still would in the future, but only for optimisation purposes, not to make
>> it work at all).
>
> Given that only char->bytes breaks backwards compatibility, since
> Py_UNICODE wasn't supported at all until now - could we agree on making
> Py_UNICODE->unicode the default behaviour and leaving char->PyInt as is for
> the time being? We can still decide to break backwards compatibility later
> on, and we can always support the explicit
>
>       py_s =<bytes>char_val
>       py_i =<int>pyunicode_val
>
> safely, so users can just rely on that if they need safe behaviour.
>
> The 'reasoning' would be that a plain 'char' is a bit too generic to map it
> to a Python bytes string (note that 'char*' isn't), whereas Py_UNICODE is
> something that only really makes sense in the context of a Python unicode
> string, so it's a lot less surprising if it also maps to a Python unicode
> string.

... and this even matches the behaviour of the 'bytes' type in Python 3 
which really returns integer values on indexing/iteration, whereas the 
'str' (i.e. 'unicode') type returns substrings. I've implemented the 
explicit (<bytes>char)->bytes and default Py_UNICODE->unicode coercions and 
updated the documentation chapter on string handling accordingly.

http://hg.cython.org/cython-docs/rev/e712d9647f47

Stefan
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to