Lisandro Dalcin, 21.04.2010 23:26:
> What do you think?
>
> diff -r 2701901737d4 Cython/Compiler/PyrexTypes.py
> --- a/Cython/Compiler/PyrexTypes.py Wed Apr 21 15:36:27 2010 +0200
> +++ b/Cython/Compiler/PyrexTypes.py Wed Apr 21 18:25:42 2010 -0300
> @@ -871,7 +871,7 @@
> # to integers here. The maximum value for a Py_UNICODE is
> # 1114111, so PyInt_FromLong() will do just fine here.
>
> - to_py_function = "PyInt_FromLong"
> + to_py_function = "PyUnicode_FromOrdinal"
>
> def sign_and_name(self):
> return "Py_UNICODE"
I didn't know about that function, even though I had looked for it in the
CPython docs. It's available in all relevant CPython versions, and it's
pretty efficient, too.
This would let Py_UNICODE values turn into a single character unicode
string when coercing to a Python object. I had also thought about this, and
wasn't sure what I wanted. In current Cython, 'char' doesn't coerce to a
single character 'bytes' object but to an integer. My thinking was that
Py_UNICODE should behave the same.
This is a bit inconsistent in itself, given that single character strings
can coerce to their C ordinal value, e.g. on comparison with
char/Py_UNICODE, but not so much of an inconsistency to break backwards
compatibility. I'm really not sure what the 'expected' behaviour is here,
although I'm leaning slightly towards the char/bytes and Py_UNICODE/unicode
coercion.
It's certainly easier to write
cdef Py_UNICODE cval = some_c_integer
py_object = <long>cval
to get a Python integer value, than to find, import and call
PyUnicode_FromOrdinal() to get a unicode string. There doesn't seem to be
an equivalent PyBytes function, so I guess the PyBytes conversion would use
py_bytes = PyBytes_FromStringAndSize(&char_val, 1)
which isn't exactly beautiful either, and certainly less so than the opposite
py_integer = <int>char_val
This would also speak in favour of letting char and Py_UNICODE coerce to
Python strings by default, although the above would go away if we special
cased the builtin chr() function to output exactly the above code for each
input type.
Another option is to consider Py_UNICODE more special (and more specific)
than the somewhat generic 'char', and to accept the inconsistency of
coercing one to a unicode string and the other to an integer.
What do the others think?
Stefan
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev