Lisandro Dalcin wrote:
> In my current understanding of the problem, the evil thing is
> automatic conversion. I'm completelly convinced of this, I believe
> Robert and Greg are also convinced

I'm convinced that unrestricted automatic conversion between
char * and unicode would be a bad idea. I'm not yet totally
convinced that Pyrex shouldn't allow it under certain
conditions, such as the string containing only ascii code
points (checked at run time).

For Pyrex, I'm also thinking about not trying to make the
language match py3 at all, at least not in every way. For
example, I may decide to keep the 'u' prefix for Python
unicode literals.

This probably isn't the right thing for Cython to do if it
wants to be a pure-Python compiler, but Pyrex has a different
goal -- it's meant to be a half-way house between Python
and C.

Currently in Pyrex, "xxx" is not a Python type at all --
it's a C type (i.e. char *). It only becomes a Python type
when used in a Python context, forcing conversion to a
Python string object.

I don't think it's necessarily wrong to keep it that way,
i.e. "xxx" is a C string, and if you want a Python string
object as a literal, you have to say which kind you want
with a "b" or "u" prefix.

That way, the Pyrex language itself can stay much the same,
and you just have to write code that takes care to accept
unicode strings if you intend to use it in a py3 environment.

> * A new C pseudo-type have to be added, lets call it 'uchar' (better
> name would be needed, it can be confused with unsigned char). Then
> something like 'cdef uchar *p = obj' will only accept an unicode
> string

What would it actually point to -- utf8 encoded chars?

How would it interact with char *?

-- 
Greg
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to