Lisandro Dalcin wrote: > In my current understanding of the problem, the evil thing is > automatic conversion. I'm completelly convinced of this, I believe > Robert and Greg are also convinced
I'm convinced that unrestricted automatic conversion between char * and unicode would be a bad idea. I'm not yet totally convinced that Pyrex shouldn't allow it under certain conditions, such as the string containing only ascii code points (checked at run time). For Pyrex, I'm also thinking about not trying to make the language match py3 at all, at least not in every way. For example, I may decide to keep the 'u' prefix for Python unicode literals. This probably isn't the right thing for Cython to do if it wants to be a pure-Python compiler, but Pyrex has a different goal -- it's meant to be a half-way house between Python and C. Currently in Pyrex, "xxx" is not a Python type at all -- it's a C type (i.e. char *). It only becomes a Python type when used in a Python context, forcing conversion to a Python string object. I don't think it's necessarily wrong to keep it that way, i.e. "xxx" is a C string, and if you want a Python string object as a literal, you have to say which kind you want with a "b" or "u" prefix. That way, the Pyrex language itself can stay much the same, and you just have to write code that takes care to accept unicode strings if you intend to use it in a py3 environment. > * A new C pseudo-type have to be added, lets call it 'uchar' (better > name would be needed, it can be confused with unsigned char). Then > something like 'cdef uchar *p = obj' will only accept an unicode > string What would it actually point to -- utf8 encoded chars? How would it interact with char *? -- Greg _______________________________________________ Cython-dev mailing list [email protected] http://codespeak.net/mailman/listinfo/cython-dev
