Robert Bradshaw wrote:
> On Apr 15, 2008, at 1:55 AM, Stefan Behnel wrote:
> Hopefully Fabrizio's GSoC project gets approved and supporting
> another syntax will be as easy as reading in another grammar file.

That would be cool, yes.


> On
> the other end of things, I would really like to output .c files that
> can be compiled and linked into either 2.x or 3.x extensions without
> having to re-run Cython (modulo, perhaps, new builtins).

Even builtins that are known to be a builtin in *some* but not all
versions of Python could be supported with some module load time checking
code. If you use them in your code, you won't be able to load the module
into the interpreter if the builtin is not available in the running
version. That's just like Python handles it.


> Using PEP 263 to determine the encoding of string literals seems the
> right thing to do. I don't want to loose the ability to do cdef char*
> s = "test" (stored as an ASCII string)

although the exact byte sequence in the C file would depend on the source
encoding of the Cython file.


> Treating "xxx" as a char*
> if it is pure ASCII, and as a unicode object otherwise, seems like
> the obvious things to do.

That's what I meant with "too much magic". Cython shouldn't distinguish
between the two based on the *content*. The distinction should be explicit
in the source and Cython should raise an error if it doesn't work out.
Above all, this means: no automatic recoding behind the scenes.

That's the main reason why Py3 has a well defined "bytes" type and a
Unicode "str" type instead of a Unicode "unicode" type and an underdefined
"str" type in Py2.


> What hasn't been resolved is conversions
>
>      cdef object o = s # s is a char*

Sure, the semantics are clear: char* is a byte sequence in C, so the
result is the equivalent of a byte sequence in Python: a byte string, i.e.
a str object in Python2 and a bytes object in Py3.

If you want a unicode string, use

    cdef object o = (<object>s).decode('UTF-8')

or whatever, maybe even the C-API Unicode decoding functions. But make
sure the encoding you use is explicit.


>      cdef char* s = o # o is a python unicode object (or,
> equivalently, the result of str(o))

That's not equivalent in Python 2, but it is in Py3.


> Should this raise a compile time error?

If the compiler knows that o *really* is of type "unicode", it can raise
an error here. Otherwise, you'd get a runtime error from Python's string
conversion functions.


> (That would break a lot of
> code...including really nice code like declaring a function argument
> to be char*)

That would still accept any kind of byte string or a bytes object in Py3,
which is just fine IMHO.


> Whatever happens, I think <object><char*>o == o and <char*><object>s
> == s are important.

This will continue to work as we are dealing with plain byte strings here.


> I like Dag's "lang: ..." proposal. [...]
> I think the default language should be
> determined by the runtime environment of the compiler, i.e. (which
> can always be overridden, ether globally or file-by-file, but
> probably won't need to be most of the time).

I actually prefer having it in the source file. Nothing keeps you from
writing one source file in Py2 and another in Py3 and combining them into
one module. :)

Stefan

_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to