[Cython] coercion of char/Py_UNICODE to Python objects - string or integer?

Stefan Behnel Wed, 21 Apr 2010 22:37:31 -0700

Lisandro Dalcin, 21.04.2010 23:26:
> What do you think?
>
> diff -r 2701901737d4 Cython/Compiler/PyrexTypes.py
> --- a/Cython/Compiler/PyrexTypes.py   Wed Apr 21 15:36:27 2010 +0200
> +++ b/Cython/Compiler/PyrexTypes.py   Wed Apr 21 18:25:42 2010 -0300
> @@ -871,7 +871,7 @@
>       # to integers here.  The maximum value for a Py_UNICODE is
>       # 1114111, so PyInt_FromLong() will do just fine here.
>
> -    to_py_function = "PyInt_FromLong"
> +    to_py_function = "PyUnicode_FromOrdinal"
>
>       def sign_and_name(self):
>           return "Py_UNICODE"


I didn't know about that function, even though I had looked for it in the 
CPython docs. It's available in all relevant CPython versions, and it's 
pretty efficient, too.

This would let Py_UNICODE values turn into a single character unicode 
string when coercing to a Python object. I had also thought about this, and 
wasn't sure what I wanted. In current Cython, 'char' doesn't coerce to a 
single character 'bytes' object but to an integer. My thinking was that 
Py_UNICODE should behave the same.

This is a bit inconsistent in itself, given that single character strings 
can coerce to their C ordinal value, e.g. on comparison with 
char/Py_UNICODE, but not so much of an inconsistency to break backwards 
compatibility. I'm really not sure what the 'expected' behaviour is here, 
although I'm leaning slightly towards the char/bytes and Py_UNICODE/unicode 
coercion.

It's certainly easier to write

     cdef Py_UNICODE cval = some_c_integer

     py_object = <long>cval

to get a Python integer value, than to find, import and call 
PyUnicode_FromOrdinal() to get a unicode string. There doesn't seem to be 
an equivalent PyBytes function, so I guess the PyBytes conversion would use

     py_bytes = PyBytes_FromStringAndSize(&char_val, 1)

which isn't exactly beautiful either, and certainly less so than the opposite

     py_integer = <int>char_val

This would also speak in favour of letting char and Py_UNICODE coerce to 
Python strings by default, although the above would go away if we special 
cased the builtin chr() function to output exactly the above code for each 
input type.

Another option is to consider Py_UNICODE more special (and more specific) 
than the somewhat generic 'char', and to accept the inconsistency of 
coercing one to a unicode string and the other to an integer.

What do the others think?

Stefan
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

[Cython] coercion of char/Py_UNICODE to Python objects - string or integer?

Reply via email to