[Numpy-discussion] moving forward around ABI/API compatibilities (was numpy 1.7.x branch)
Hi, I am just continuing the discussion around ABI/API, the technical side of things that is, as this is unrelated to 1.7.x. release. On Tue, Jun 26, 2012 at 11:41 AM, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.no wrote: On 06/26/2012 11:58 AM, David Cournapeau wrote: On Tue, Jun 26, 2012 at 10:27 AM, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.no wrote: On 06/26/2012 05:35 AM, David Cournapeau wrote: On Tue, Jun 26, 2012 at 4:10 AM, Ondřej Čertíkondrej.cer...@gmail.com wrote: My understanding is that Travis is simply trying to stress We have to think about the implications of our changes on existing users. and also that little changes (with the best intentions!) that however mean either a breakage or confusion for users (due to historical reasons) should be avoided if possible. And I very strongly feel the same way. And I think that most people on this list do as well. I think Travis is more concerned about API than ABI changes (in that example for 1.4, the ABI breakage was caused by a change that was pushed by Travis IIRC). The relative importance of API vs ABI is a tough one: I think ABI breakage is as bad as API breakage (but matter in different circumstances), but it is hard to improve the situation around our ABI without changing the API (especially everything around macros and publicly accessible structures). Changing this is politically But I think it is *possible* to get to a situation where ABI isn't broken without changing API. I have posted such a proposal. If one uses the kind of C-level duck typing I describe in the link below, one would do typedef PyObject PyArrayObject; typedef struct { ... } NumPyArray; /* used to be PyArrayObject */ Maybe we're just in violent agreement, but whatever ends up being used would require to change the *current* C API, right ? If one wants to Accessing arr-dims[i] directly would need to change. But that's been discouraged for a long time. By API I meant access through the macros. One of the changes under discussion here is to change PyArray_SHAPE from a macro that accepts both PyObject* and PyArrayObject* to a function that only accepts PyArrayObject* (hence breakage). I'm saying that under my proposal, assuming I or somebody else can find the time to implement it under, you can both make it a function and have it accept both PyObject* and PyArrayObject* (since they are the same), undoing the breakage but allowing to hide the ABI. (It doesn't give you full flexibility in ABI, it does require that you somewhere have an npy_intp dims[nd] with the same lifetime as your object, etc., but I don't consider that a big disadvantage). allow for changes in our structures more freely, we have to hide them from the headers, which means breaking the code that depends on the structure binary layout. Any code that access those directly will need to be changed. There is the particular issue of iterator, which seem quite difficult to make ABI-safe without losing significant performance. I don't agree (for some meanings of ABI-safe). You can export the data (dataptr/shape/strides) through the ABI, then the iterator uses these in whatever way it wishes consumer-side. Sort of like PEP 3118 without the performance degradation. The only sane way IMO of doing iteration is building it into the consumer anyway. (I have not read the whole cython discussion yet) What do you mean by building iteration in the consumer ? My understanding is that any data export would be done through a level of indirection (dataptr/shape/strides). Conceptually, I can't see how one could keep ABI without that level of indirection without some compile. In the case of iterator, that means multiple pointer chasing per sample -- i.e. the tight loop issue you mentioned earlier for PyArray_DATA is the common case for iterator. I can only see two ways of doing fast (special casing) iteration: compile-time special casing or runtime optimization. Compile-time requires access to the internals (even if one were to use C++ with advanced template magic ala STL/iterator, I don't think one can get performance if everything is not in the headers, but maybe C++ compilers are super smart those days in ways I can't comprehend). I would think runtime is the long-term solution, but that's far away, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] moving forward around ABI/API compatibilities (was numpy 1.7.x branch)
On 06/26/2012 01:48 PM, David Cournapeau wrote: Hi, I am just continuing the discussion around ABI/API, the technical side of things that is, as this is unrelated to 1.7.x. release. On Tue, Jun 26, 2012 at 11:41 AM, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.no wrote: On 06/26/2012 11:58 AM, David Cournapeau wrote: On Tue, Jun 26, 2012 at 10:27 AM, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.nowrote: On 06/26/2012 05:35 AM, David Cournapeau wrote: On Tue, Jun 26, 2012 at 4:10 AM, Ondřej Čertíkondrej.cer...@gmail.com wrote: My understanding is that Travis is simply trying to stress We have to think about the implications of our changes on existing users. and also that little changes (with the best intentions!) that however mean either a breakage or confusion for users (due to historical reasons) should be avoided if possible. And I very strongly feel the same way. And I think that most people on this list do as well. I think Travis is more concerned about API than ABI changes (in that example for 1.4, the ABI breakage was caused by a change that was pushed by Travis IIRC). The relative importance of API vs ABI is a tough one: I think ABI breakage is as bad as API breakage (but matter in different circumstances), but it is hard to improve the situation around our ABI without changing the API (especially everything around macros and publicly accessible structures). Changing this is politically But I think it is *possible* to get to a situation where ABI isn't broken without changing API. I have posted such a proposal. If one uses the kind of C-level duck typing I describe in the link below, one would do typedef PyObject PyArrayObject; typedef struct { ... } NumPyArray; /* used to be PyArrayObject */ Maybe we're just in violent agreement, but whatever ends up being used would require to change the *current* C API, right ? If one wants to Accessing arr-dims[i] directly would need to change. But that's been discouraged for a long time. By API I meant access through the macros. One of the changes under discussion here is to change PyArray_SHAPE from a macro that accepts both PyObject* and PyArrayObject* to a function that only accepts PyArrayObject* (hence breakage). I'm saying that under my proposal, assuming I or somebody else can find the time to implement it under, you can both make it a function and have it accept both PyObject* and PyArrayObject* (since they are the same), undoing the breakage but allowing to hide the ABI. (It doesn't give you full flexibility in ABI, it does require that you somewhere have an npy_intp dims[nd] with the same lifetime as your object, etc., but I don't consider that a big disadvantage). allow for changes in our structures more freely, we have to hide them from the headers, which means breaking the code that depends on the structure binary layout. Any code that access those directly will need to be changed. There is the particular issue of iterator, which seem quite difficult to make ABI-safe without losing significant performance. I don't agree (for some meanings of ABI-safe). You can export the data (dataptr/shape/strides) through the ABI, then the iterator uses these in whatever way it wishes consumer-side. Sort of like PEP 3118 without the performance degradation. The only sane way IMO of doing iteration is building it into the consumer anyway. (I have not read the whole cython discussion yet) I'll try to write a summary and post it when I can get around to it. What do you mean by building iteration in the consumer ? My consumer is the user of the NumPy C API. So I meant that the iteration logic is all in C header files and compiled again for each such consumer. Iterators don't cross the ABI boundary. understanding is that any data export would be done through a level of indirection (dataptr/shape/strides). Conceptually, I can't see how one could keep ABI without that level of indirection without some compile. In the case of iterator, that means multiple pointer chasing per sample -- i.e. the tight loop issue you mentioned earlier for PyArray_DATA is the common case for iterator. Even if you do indirection, iterator utilities that are compiled in the consumer/user code can cache the data that's retrieved. Iterators just do // setup crossing ABI npy_intp *shape = PyArray_DIMS(arr); npy_intp *strides = PyArray_STRIDES(arr); ... // performance-sensitive code just accesses cached pointers and don't // cross ABI We're probably in violent agreement and just talking past one another...? I can only see two ways of doing fast (special casing) iteration: compile-time special casing or runtime optimization. Compile-time requires access to the internals (even if one were to use C++ with advanced template magic ala STL/iterator, I don't think one can get performance if everything is not in the headers, but maybe C++ compilers are super smart those days in ways I
Re: [Numpy-discussion] moving forward around ABI/API compatibilities (was numpy 1.7.x branch)
On 06/26/2012 01:48 PM, David Cournapeau wrote: Hi, I am just continuing the discussion around ABI/API, the technical side of things that is, as this is unrelated to 1.7.x. release. On Tue, Jun 26, 2012 at 11:41 AM, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.no wrote: On 06/26/2012 11:58 AM, David Cournapeau wrote: On Tue, Jun 26, 2012 at 10:27 AM, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.nowrote: On 06/26/2012 05:35 AM, David Cournapeau wrote: On Tue, Jun 26, 2012 at 4:10 AM, Ondřej Čertíkondrej.cer...@gmail.com wrote: My understanding is that Travis is simply trying to stress We have to think about the implications of our changes on existing users. and also that little changes (with the best intentions!) that however mean either a breakage or confusion for users (due to historical reasons) should be avoided if possible. And I very strongly feel the same way. And I think that most people on this list do as well. I think Travis is more concerned about API than ABI changes (in that example for 1.4, the ABI breakage was caused by a change that was pushed by Travis IIRC). The relative importance of API vs ABI is a tough one: I think ABI breakage is as bad as API breakage (but matter in different circumstances), but it is hard to improve the situation around our ABI without changing the API (especially everything around macros and publicly accessible structures). Changing this is politically But I think it is *possible* to get to a situation where ABI isn't broken without changing API. I have posted such a proposal. If one uses the kind of C-level duck typing I describe in the link below, one would do typedef PyObject PyArrayObject; typedef struct { ... } NumPyArray; /* used to be PyArrayObject */ Maybe we're just in violent agreement, but whatever ends up being used would require to change the *current* C API, right ? If one wants to Accessing arr-dims[i] directly would need to change. But that's been discouraged for a long time. By API I meant access through the macros. One of the changes under discussion here is to change PyArray_SHAPE from a macro that accepts both PyObject* and PyArrayObject* to a function that only accepts PyArrayObject* (hence breakage). I'm saying that under my proposal, assuming I or somebody else can find the time to implement it under, you can both make it a function and have it accept both PyObject* and PyArrayObject* (since they are the same), undoing the breakage but allowing to hide the ABI. (It doesn't give you full flexibility in ABI, it does require that you somewhere have an npy_intp dims[nd] with the same lifetime as your object, etc., but I don't consider that a big disadvantage). allow for changes in our structures more freely, we have to hide them from the headers, which means breaking the code that depends on the structure binary layout. Any code that access those directly will need to be changed. There is the particular issue of iterator, which seem quite difficult to make ABI-safe without losing significant performance. I don't agree (for some meanings of ABI-safe). You can export the data (dataptr/shape/strides) through the ABI, then the iterator uses these in whatever way it wishes consumer-side. Sort of like PEP 3118 without the performance degradation. The only sane way IMO of doing iteration is building it into the consumer anyway. (I have not read the whole cython discussion yet) So here's the summary. It's rather complicated but also incredibly neat :-) And technical details can be hidden behind a tight API. - We introduce a C-level metaclass, extensibletype, which to each type adds a branch-miss-free string-pointer hash table. The ndarray type is made an instance of this metaclass, so that you can do PyCustomSlots_GetTable(array_object-ob_type) - The hash table uses a perfect hashing scheme: a) We take the lower 64 bits of md5 of the lookup string (this can be done compile-time or module-load-time) as a pre-hash h. b) When looking up the table for a key with pre-hash h, the index in the table is given by ((h table-r) table-m1) ^ table-d[r table-m2] Then, *if* the element is present, it will always be found on the first try; the table is guaranteed collisionless. This means that an expensive branch-miss can be avoided. It is really incredibly fast in practice, with a 0.5 ns penalty on my 1.8 GHz laptop. The magic is in finding the right table-r and table-d. For a 64-slot table, parameters r and d[0]..d[63] can be found in 10us on my machine (it's an O(n) operation). (table-d[i] has type uint16_t) (This algorithm was found in an academic paper which I'm too lazy to dig up from that thread right now; perfect hashing is an active research field.) The result? You can use this table to store function pointers in the type, like C++ virtual tables or like the built-in slots like tp_get_buffer, but *without* having to agree on
Re: [Numpy-discussion] moving forward around ABI/API compatibilities (was numpy 1.7.x branch)
On 06/26/2012 04:08 PM, Dag Sverre Seljebotn wrote: On 06/26/2012 01:48 PM, David Cournapeau wrote: Hi, I am just continuing the discussion around ABI/API, the technical side of things that is, as this is unrelated to 1.7.x. release. On Tue, Jun 26, 2012 at 11:41 AM, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.no wrote: On 06/26/2012 11:58 AM, David Cournapeau wrote: On Tue, Jun 26, 2012 at 10:27 AM, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.no wrote: On 06/26/2012 05:35 AM, David Cournapeau wrote: On Tue, Jun 26, 2012 at 4:10 AM, Ondřej Čertíkondrej.cer...@gmail.com wrote: My understanding is that Travis is simply trying to stress We have to think about the implications of our changes on existing users. and also that little changes (with the best intentions!) that however mean either a breakage or confusion for users (due to historical reasons) should be avoided if possible. And I very strongly feel the same way. And I think that most people on this list do as well. I think Travis is more concerned about API than ABI changes (in that example for 1.4, the ABI breakage was caused by a change that was pushed by Travis IIRC). The relative importance of API vs ABI is a tough one: I think ABI breakage is as bad as API breakage (but matter in different circumstances), but it is hard to improve the situation around our ABI without changing the API (especially everything around macros and publicly accessible structures). Changing this is politically But I think it is *possible* to get to a situation where ABI isn't broken without changing API. I have posted such a proposal. If one uses the kind of C-level duck typing I describe in the link below, one would do typedef PyObject PyArrayObject; typedef struct { ... } NumPyArray; /* used to be PyArrayObject */ Maybe we're just in violent agreement, but whatever ends up being used would require to change the *current* C API, right ? If one wants to Accessing arr-dims[i] directly would need to change. But that's been discouraged for a long time. By API I meant access through the macros. One of the changes under discussion here is to change PyArray_SHAPE from a macro that accepts both PyObject* and PyArrayObject* to a function that only accepts PyArrayObject* (hence breakage). I'm saying that under my proposal, assuming I or somebody else can find the time to implement it under, you can both make it a function and have it accept both PyObject* and PyArrayObject* (since they are the same), undoing the breakage but allowing to hide the ABI. (It doesn't give you full flexibility in ABI, it does require that you somewhere have an npy_intp dims[nd] with the same lifetime as your object, etc., but I don't consider that a big disadvantage). allow for changes in our structures more freely, we have to hide them from the headers, which means breaking the code that depends on the structure binary layout. Any code that access those directly will need to be changed. There is the particular issue of iterator, which seem quite difficult to make ABI-safe without losing significant performance. I don't agree (for some meanings of ABI-safe). You can export the data (dataptr/shape/strides) through the ABI, then the iterator uses these in whatever way it wishes consumer-side. Sort of like PEP 3118 without the performance degradation. The only sane way IMO of doing iteration is building it into the consumer anyway. (I have not read the whole cython discussion yet) So here's the summary. It's rather complicated but also incredibly neat :-) And technical details can be hidden behind a tight API. - We introduce a C-level metaclass, extensibletype, which to each type adds a branch-miss-free string-pointer hash table. The ndarray type is made an instance of this metaclass, so that you can do PyCustomSlots_GetTable(array_object-ob_type) - The hash table uses a perfect hashing scheme: a) We take the lower 64 bits of md5 of the lookup string (this can be done compile-time or module-load-time) as a pre-hash h. b) When looking up the table for a key with pre-hash h, the index in the table is given by ((h table-r) table-m1) ^ table-d[r table-m2] Sorry, typo. Should be ((h table-r) table-m1) ^ table-d[h table-m2] What happens is that h table-m2 sorts the keys of the table into n buckets. Then r is selected (among 64 possible choices) so that there's no intra-bucket collisions. Finally, d is chosen so that none of the buckets collide, starting with the largest one. Dag Then, *if* the element is present, it will always be found on the first try; the table is guaranteed collisionless. This means that an expensive branch-miss can be avoided. It is really incredibly fast in practice, with a 0.5 ns penalty on my 1.8 GHz laptop. The magic is in finding the right table-r and table-d. For a 64-slot table, parameters r and d[0]..d[63] can be found in 10us on my machine (it's an O(n)
Re: [Numpy-discussion] moving forward around ABI/API compatibilities (was numpy 1.7.x branch)
On Tue, Jun 26, 2012 at 2:40 PM, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.no wrote: On 06/26/2012 01:48 PM, David Cournapeau wrote: Hi, I am just continuing the discussion around ABI/API, the technical side of things that is, as this is unrelated to 1.7.x. release. On Tue, Jun 26, 2012 at 11:41 AM, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.no wrote: On 06/26/2012 11:58 AM, David Cournapeau wrote: On Tue, Jun 26, 2012 at 10:27 AM, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.no wrote: On 06/26/2012 05:35 AM, David Cournapeau wrote: On Tue, Jun 26, 2012 at 4:10 AM, Ondřej Čertíkondrej.cer...@gmail.com wrote: My understanding is that Travis is simply trying to stress We have to think about the implications of our changes on existing users. and also that little changes (with the best intentions!) that however mean either a breakage or confusion for users (due to historical reasons) should be avoided if possible. And I very strongly feel the same way. And I think that most people on this list do as well. I think Travis is more concerned about API than ABI changes (in that example for 1.4, the ABI breakage was caused by a change that was pushed by Travis IIRC). The relative importance of API vs ABI is a tough one: I think ABI breakage is as bad as API breakage (but matter in different circumstances), but it is hard to improve the situation around our ABI without changing the API (especially everything around macros and publicly accessible structures). Changing this is politically But I think it is *possible* to get to a situation where ABI isn't broken without changing API. I have posted such a proposal. If one uses the kind of C-level duck typing I describe in the link below, one would do typedef PyObject PyArrayObject; typedef struct { ... } NumPyArray; /* used to be PyArrayObject */ Maybe we're just in violent agreement, but whatever ends up being used would require to change the *current* C API, right ? If one wants to Accessing arr-dims[i] directly would need to change. But that's been discouraged for a long time. By API I meant access through the macros. One of the changes under discussion here is to change PyArray_SHAPE from a macro that accepts both PyObject* and PyArrayObject* to a function that only accepts PyArrayObject* (hence breakage). I'm saying that under my proposal, assuming I or somebody else can find the time to implement it under, you can both make it a function and have it accept both PyObject* and PyArrayObject* (since they are the same), undoing the breakage but allowing to hide the ABI. (It doesn't give you full flexibility in ABI, it does require that you somewhere have an npy_intp dims[nd] with the same lifetime as your object, etc., but I don't consider that a big disadvantage). allow for changes in our structures more freely, we have to hide them from the headers, which means breaking the code that depends on the structure binary layout. Any code that access those directly will need to be changed. There is the particular issue of iterator, which seem quite difficult to make ABI-safe without losing significant performance. I don't agree (for some meanings of ABI-safe). You can export the data (dataptr/shape/strides) through the ABI, then the iterator uses these in whatever way it wishes consumer-side. Sort of like PEP 3118 without the performance degradation. The only sane way IMO of doing iteration is building it into the consumer anyway. (I have not read the whole cython discussion yet) I'll try to write a summary and post it when I can get around to it. What do you mean by building iteration in the consumer ? My consumer is the user of the NumPy C API. So I meant that the iteration logic is all in C header files and compiled again for each such consumer. Iterators don't cross the ABI boundary. understanding is that any data export would be done through a level of indirection (dataptr/shape/strides). Conceptually, I can't see how one could keep ABI without that level of indirection without some compile. In the case of iterator, that means multiple pointer chasing per sample -- i.e. the tight loop issue you mentioned earlier for PyArray_DATA is the common case for iterator. Even if you do indirection, iterator utilities that are compiled in the consumer/user code can cache the data that's retrieved. Iterators just do // setup crossing ABI npy_intp *shape = PyArray_DIMS(arr); npy_intp *strides = PyArray_STRIDES(arr); ... // performance-sensitive code just accesses cached pointers and don't // cross ABI The problem is that iterators need more that this. But thinking more about it, I am not so dead sure we could not get there. I will need to play with some code. Going slightly OT, then IMO, the *only* long-term solution in 2012 is LLVM. That allows you to do any level of inlining and special casing and optimization at run-time, which is the only way of
Re: [Numpy-discussion] moving forward around ABI/API compatibilities (was numpy 1.7.x branch)
(I have not read the whole cython discussion yet) So here's the summary. It's rather complicated but also incredibly neat :-) And technical details can be hidden behind a tight API. Could you provide a bit more context for this list. I think this is an important technology concept. I'd like to understand better how well it jives with Numba-produced APIs and how we can make use of it in NumPy. Where exactly would this be used in the NumPy API? What would it replace? - We introduce a C-level metaclass, extensibletype, which to each type adds a branch-miss-free string-pointer hash table. The ndarray type is made an instance of this metaclass, so that you can do PyCustomSlots_GetTable(array_object-ob_type) - The hash table uses a perfect hashing scheme: a) We take the lower 64 bits of md5 of the lookup string (this can be done compile-time or module-load-time) as a pre-hash h. b) When looking up the table for a key with pre-hash h, the index in the table is given by ((h table-r) table-m1) ^ table-d[r table-m2] Then, *if* the element is present, it will always be found on the first try; the table is guaranteed collisionless. This means that an expensive branch-miss can be avoided. It is really incredibly fast in practice, with a 0.5 ns penalty on my 1.8 GHz laptop. The magic is in finding the right table-r and table-d. For a 64-slot table, parameters r and d[0]..d[63] can be found in 10us on my machine (it's an O(n) operation). (table-d[i] has type uint16_t) (This algorithm was found in an academic paper which I'm too lazy to dig up from that thread right now; perfect hashing is an active research field.) The result? You can use this table to store function pointers in the type, like C++ virtual tables or like the built-in slots like tp_get_buffer, but *without* having to agree on everything at compile-time like in C++. And the only penalty is ~0.5 ns per call and some cache usage. Cython would use this to replace the current custom cdef class vtable with something more tools could agree on, e.g. store function pointers in the table with keys like method:foo:i4i8-f4 But NumPy could easily store entries relating to its C API in the same hash table, numpy:SHAPE Then, the C API functions would all take PyObject*, look up the fast hash table on the ob_type. This allows for incredibly flexible duck typing on the C level. This does sound very nice. PyArray_Check would just check for the presence of the C API but not care about the actual Python type, i.e., no PyObject_TypeCheck. Me and Robert have talked a lot about this and will move forward with it for Cython. Obviously I don't expect others than me to pick it up for NumPy so we'll see... I'll write up a specification document sometimes over the next couple of weeks as we need that even if only for Cython. We will look forward to what you come up with. Best regards, -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] moving forward around ABI/API compatibilities (was numpy 1.7.x branch)
On 06/26/2012 05:02 PM, Travis Oliphant wrote: (I have not read the whole cython discussion yet) So here's the summary. It's rather complicated but also incredibly neat :-) And technical details can be hidden behind a tight API. Could you provide a bit more context for this list. I think this is an important technology concept. I'd like to understand better how well it jives with Numba-produced APIs and how we can make use of it in NumPy. Where exactly would this be used in the NumPy API? What would it replace? Right. I thought I did that :-) I realize I might sometimes be too brief, part of the problem is I'm used to Cython development where I can start a sentence and then Mark Florisson or Robert Bradshaw can finish it. I'll try to step through how PyArray_DIMS could work under a refactored API from a C client. To do this I gloss over some of the finer points etc. and just make a premature decision here and there. Almost none of the types or functions below already exists, I'll assume we implement them (I do have a good start on the reference implementation). We'll add a new C-level slot called numpy:SHAPE to the ndarray type, and hook the PyArray_DIMS to use this slot. Inside NumPy The PyArray_Type (?) definition changes from being a PyTypeObject to a PyExtensibleTypeObject, and PyExtensibleType_Ready is called instead of PyType_Ready. This builds the perfect lookup table etc. I'll omit the details. The caller -- First we need some macro module initialization setup (part of NumPy include files): /* lower-64-bits of md5 of numpy:SHAPE */ #define NPY_SHAPE_SLOT_PREHASH 0xa8cf70dc5f598f40ULL /* hold an interned numpy:SHAPE string */ static char *_Npy_interned_numpy_SHAPE; Then initialize interned key in import_array(): ... import_array(...) { ... PyCustomSlotsInternerContext interner = PyCustomSlots_GetInterner(); _Npy_interned_numpy_SHAPE = PyCustomSlots_InternLiteral(numpy:SHAPE); ... } Then, let's get rid of that PyArrayObject (in the *API*; of course there's still some struct representing the NumPy array internally but its layout is no longer exposed anywhere). That means always using PyObject, just like the Python API does, e.g., PyDict_GetItem gets a PyObject even if it must be a dict. But for backwards compatability, let's throw in: typedef PyObject PyArrayObject; Now, change PyArray_Check a bit (likely/unlikely indicates branch hints, e.g. __builtin_expect in gcc). Some context: typedef struct { char *interned_key; uintptr_t flags; void *funcptr; } PyCustomSlot; Then: static inline int PyArray_Check(PyObject *arr) { /* it is an array if it has the numpy:SHAPE slot This is a bad choice of test but for simplicity... */ if (likely(PyCustomSlots_Check(arr-ob_type)) { PyCustomSlot *slot; slot = PyCustomSlots_Find(arr-ob_type, NPY_SHAPE_SLOT_PREHASH, _Npy_interned_numpy_SHAPE) if (likely(slot != NULL)) return 1; } return 0; } Finally, we can write our new PyArray_DIMS: static inline npy_intp *PyArray_DIMS(PyObject *arr) { PyCustomSlot *slot = PyCustomSlots_FindAssumePresent(arr-tp_base, NPY_SHAPE_SLOT_PREHASH); return (*slot-funcptr)(arr); } What goes on here is: - PyCustomSlots_Check checks whether the metaclass (arr-ob_type-tp_base) is the PyExtensibleType_Type, which is a class we agree upon by SEP - PyCustomSlots_Find takes the prehash of the key which through the parametrized hash function gives the position in the hash table. At that position in the PyCustomSlot array, one either finds the element (by comparing the interned key by pointer value), or the element is not in the table (so no loops or branch misses). - Finally, inside PyArray_DIMS we assume that PyArray_Check has already been called. Thus, since we know the slot is in the table, we can skip even the check and shave off a nanosecond. What is replaced Largely the macros and existing function pointers imported by import_array. However, some of the functions (in particular constructors etc.) would work just like before. Only OOP methods change their behaviour. Compared to the macros, there should be ~4-7 ns penalty per call on my computer (1.9 GHz). However, compared to making PyArray_SHAPE a function going through the import_array function table, the cost is only a couple of ns. Me and Robert have talked a lot about this and will move forward with it for Cython. Obviously I don't expect others than me to pick it up for NumPy so we'll see... I'll write up a specification document sometimes over the next couple of weeks as we need that even if only for Cython. We will look forward to what you come up with. Will keep you posted, Dag ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] moving forward around ABI/API compatibilities (was numpy 1.7.x branch)
On 06/26/2012 10:35 PM, Dag Sverre Seljebotn wrote: On 06/26/2012 05:02 PM, Travis Oliphant wrote: (I have not read the whole cython discussion yet) So here's the summary. It's rather complicated but also incredibly neat :-) And technical details can be hidden behind a tight API. Could you provide a bit more context for this list. I think this is an important technology concept. I'd like to understand better how well it jives with Numba-produced APIs and how we can make use of it in NumPy. Where exactly would this be used in the NumPy API? What would it replace? Right. I thought I did that :-) I realize I might sometimes be too brief, part of the problem is I'm used to Cython development where I can start a sentence and then Mark Florisson or Robert Bradshaw can finish it. I'll try to step through how PyArray_DIMS could work under a refactored API from a C client. To do this I gloss over some of the finer points etc. and just make a premature decision here and there. Almost none of the types or functions below already exists, I'll assume we implement them (I do have a good start on the reference implementation). We'll add a new C-level slot called numpy:SHAPE to the ndarray type, and hook the PyArray_DIMS to use this slot. Inside NumPy The PyArray_Type (?) definition changes from being a PyTypeObject to a PyExtensibleTypeObject, and PyExtensibleType_Ready is called instead of PyType_Ready. This builds the perfect lookup table etc. I'll omit the details. The caller -- First we need some macro module initialization setup (part of NumPy include files): /* lower-64-bits of md5 of numpy:SHAPE */ #define NPY_SHAPE_SLOT_PREHASH 0xa8cf70dc5f598f40ULL /* hold an interned numpy:SHAPE string */ static char *_Npy_interned_numpy_SHAPE; Then initialize interned key in import_array(): ... import_array(...) { ... PyCustomSlotsInternerContext interner = PyCustomSlots_GetInterner(); _Npy_interned_numpy_SHAPE = PyCustomSlots_InternLiteral(numpy:SHAPE); ... } Then, let's get rid of that PyArrayObject (in the *API*; of course there's still some struct representing the NumPy array internally but its layout is no longer exposed anywhere). That means always using PyObject, just like the Python API does, e.g., PyDict_GetItem gets a PyObject even if it must be a dict. But for backwards compatability, let's throw in: typedef PyObject PyArrayObject; Now, change PyArray_Check a bit (likely/unlikely indicates branch hints, e.g. __builtin_expect in gcc). Some context: typedef struct { char *interned_key; uintptr_t flags; void *funcptr; } PyCustomSlot; Then: static inline int PyArray_Check(PyObject *arr) { /* it is an array if it has the numpy:SHAPE slot This is a bad choice of test but for simplicity... */ if (likely(PyCustomSlots_Check(arr-ob_type)) { PyCustomSlot *slot; slot = PyCustomSlots_Find(arr-ob_type, NPY_SHAPE_SLOT_PREHASH, _Npy_interned_numpy_SHAPE) if (likely(slot != NULL)) return 1; } return 0; } Finally, we can write our new PyArray_DIMS: First bug report: static inline npy_intp *PyArray_DIMS(PyObject *arr) { PyCustomSlot *slot = PyCustomSlots_FindAssumePresent(arr-tp_base, NPY_SHAPE_SLOT_PREHASH); return (*slot-funcptr)(arr); last line should be npy_intp *(*func)(PyObject*); func = slot-funcptr; /* tbd throw in cast for C++ */ return (*func)(arr); Dag } What goes on here is: - PyCustomSlots_Check checks whether the metaclass (arr-ob_type-tp_base) is the PyExtensibleType_Type, which is a class we agree upon by SEP - PyCustomSlots_Find takes the prehash of the key which through the parametrized hash function gives the position in the hash table. At that position in the PyCustomSlot array, one either finds the element (by comparing the interned key by pointer value), or the element is not in the table (so no loops or branch misses). - Finally, inside PyArray_DIMS we assume that PyArray_Check has already been called. Thus, since we know the slot is in the table, we can skip even the check and shave off a nanosecond. What is replaced Largely the macros and existing function pointers imported by import_array. However, some of the functions (in particular constructors etc.) would work just like before. Only OOP methods change their behaviour. Compared to the macros, there should be ~4-7 ns penalty per call on my computer (1.9 GHz). However, compared to making PyArray_SHAPE a function going through the import_array function table, the cost is only a couple of ns. Me and Robert have talked a lot about this and will move forward with it for Cython. Obviously I don't expect others than me to pick it up for NumPy so we'll see... I'll write up a specification document sometimes over the next couple of weeks as we need that even if only for Cython. We will