On Oct 21, 2009, at 1:10 PM, John Arbash Meinel wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > This bug seems to affect both cython and pyrex. > > Namely, I'm parsing a string that has NULL characters in it (known > width), which is also likely to be redundant within the data stream. > > I'm doing something like: > > mystr = PyString_FromStringAndSize(NULL, count+other_count) > memcpy(mystr, some_bytes, count) > memcpy(mystr+count, more_bytes, other_count) > > mystr = intern(mystr) > > This fails because pyrex and cython both effectively translate this > code > into: > > char *temp; > temp = PyString_AsString(mystr); > mystr = PyString_InternFromString(temp); > > With, of course, appropriate error checking and incref/decref > handling. > > I would, of course, like to use PyString_InternInPlace(PyObject **), > however that fails for other reasons. "taking address of a non l- > value" > if you try to do: > > cdef extern from "Python.h": > ctypedef struct PyObject: > pass > void PyString_InternInPlace(PyObject **) > > > st = 'my string' > PyString_InternInPlace(&<PyObject *>st) > > > Now I can probably do some trickery with > > cdef PyObject *as_ptr > > as_ptr = <PyObject *>st > PyString_InternInPlace(&as_ptr) > st = <object>as_ptr > > However, because InternInPlace may destroy 'st', and that final > assignment will be doing a DECREF on the 'st' object, I'm pretty > sure it > will blow up. > > It feels like the only thing left to do is define a macro in a header > with something like: > > #define INTERN_STRING(obj) (PyString_InternInPlace(&(obj)))) > > and then > > cdef extern from "myheader.h": > INTERN_STRING(object) > > Is this true? > > John
Good catch. I've disabled optimizing the intern builtin in Cython for now. We could re-enable it for char* only if someone finds interning strings to be a bottleneck. - Robert _______________________________________________ Cython-dev mailing list [email protected] http://codespeak.net/mailman/listinfo/cython-dev
