Re: [Cython] String types with Python 2.x and 3.x

Stefan Behnel Sat, 12 Sep 2009 00:35:42 -0700

Robert Bradshaw wrote:
> If I compile the module against Py2, it should behave as if it  
> was a .py file under Py2, and if I compile the module under Py3, it  
> should behave as if it were a .py file under Py3. Moving code  
> from .py to .pyx should not change its behavior.


Well, when you run a Py2 script in Py3, the semantics change. So it doesn't
make sense to say "moving code from .py to .pyx should not change its
behavior", as the same .py file can already have different behaviour.

I'm fine with providing a separate front-end for compiling Python 3 code
("cython3" ?), so I'm also fine with providing a separate front-end for
compiling Python 2 code. Simply seeing the .py extension isn't enough anymore.

I'm also fine with a command line option "-3"/"-2" that defines the
semantics when compiling a .py file. However, once the compilation is done,
I think the semantics of literals should be fixed and should not change
depending on the platform.


>>      isinstance("abc", unicode)
>>
>> return False in Py2 and True in Py3.
> 
> This is an error in Py3.

Correct, but neither in Python 2 nor in Cython, which currently uses the
Py2 builtin names.


> I don't see "abc" as a byte string, I see it as a string literal. If  
> it's used in a C context it's a byte string, and if used as a Python  
> object it's a Python str. This is how we handle all other literals  
> (e.g. large integer literals used as Python objects are not the same  
> as large integer literals truncated to an int then used as a Python  
> object).

So your proposal is to make

        cdef char* s

        s = "äöäüöfs#dfsjdföasjf"

a C byte string encoded in source encoding, and

        s = "äöäüöfs#dfsjdföasjf"

a byte string in source-encoding when run in Python 2 and a decoded unicode
string when run in Python 3?

Note that this means that

        s = "äöäüöfs#dfsjdföasjf"

        cdef char* cs = s

will work in Py2 and fail in Py3, whereas it currently works identically in
both.

This means that you'd have to prefix basically all Python string literals
with either 'b' or 'u' if you want a fixed type/semantics, whereas now you
only have to prefix Python unicode strings with a 'u', following Python 2
syntax.

Given that this is more code overhead, do you have a real use case for
literals that behave that way? The only place I've seen this so far are
keyword argument dicts that you fill with literal string names. A rather
rare thing, IMHO, and easy to fix using e.g. the dict() factory.

Stefan
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] String types with Python 2.x and 3.x

Reply via email to