Matthieu Perrot wrote: > hi, > > I need to handle strings shaped by a numpy array whose data own to a C > structure. There is several possible answers to this problem : > 1) use a numpy array of strings (PyArray_STRING) and so a (char *) object > in C. It works as is, but you need to define a maximum size to your strings > because your set of strings is contiguous in memory. > 2) use a numpy array of objects (PyArray_OBJECT), and wrap each «C string» > with a python object, using PyStringObject for example. Then our problem is > that there is as wrapper as data element and I believe data can't be shared > when your created PyStringObject using (char *) thanks to > PyString_AsStringAndSize by example. > > > Now, I will expose a third way, which allow you to use no size-limited strings > (as in solution 1.) and don't create wrappers before you really need it > (on demand/access). > > First, for convenience, we will use in C, (char **) type to build an array of > string pointers (as it was suggested in solution 2). Now, the game is to > make it works with numpy API, and use it in python through a python array. > Basically, I want a very similar behabiour than arrays of PyObject, where > data are not contiguous, only their address are. So, the idea is to create > a new array descr based on PyArray_OBJECT and change its getitem/setitem > functions to deals with my own data. > > I exepected numpy to work with this convenient array descr, but it fails > because PyArray_Scalar (arrayobject.c) don't call descriptor getitem function > (in PyArray_OBJECT case) but call 2 lines which have been copy/paste from > the OBJECT_getitem function). Here my small patch is : > replace (arrayobject.c:983-984): > Py_INCREF(*((PyObject **)data)); > return *((PyObject **)data); > by : > return descr->f->getitem(data, base); > > I play a lot with my new numpy array after this change and noticed that a lot > of uses works : > This is an interesting solution. I was not considering it, though, and so I'm not surprised you have problems. You can register new types but basing them off of PyArray_OBJECT can be problematic because of the special-casing that is done in several places to manage reference counting.
You are supposed to register your own data-types and get your own typenumber. Then you can define all the functions for the entries as you wish. Riding on the back of PyArray_OBJECT may work if you are clever, but it may fail mysteriously as well because of a reference count snafu. Thanks for the tests and bug-reports. I have no problem changing the code as you suggest. -Travis _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion