Robert Bradshaw, 12.12.2009 20:27:
> However, I agree with your assessment of backwards incompatibility.  
> Consider
> 
>      len("\xc3\xbf")
> 
> In both Python 2 and Python 3 this gives 2, but in Cython it gives 2  
> when compiled against 2.x and 1 when compiled against 3.x. That seems  
> inconsistent.

The inconsistent thing here is that the string changes semantics *after*
being parsed, whereas Python simply parses it differently in Py2 and Py3.

This could be worked around in Cython by parsing the string literal twice
(potentially in parallel) once with byte string semantics and once with
unicode string semantics, and then generate two C string literals into the
C code that get converted back into a Python string depending on the C
compile time Python version. (Note that simple recoding isn't possible as
there may not be an encoding that maps the unicode string literal to the
byte string literal if character escapes are used).

This whole 'str' semantics business is really getting hard to understand by
now. If we're having a hard time to "get it right", how is a user ever
going to understand the semantics once we're done?

Stefan

_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to