[Numpy-discussion] moving forward around ABI/API compatibilities (was numpy 1.7.x branch)

2012-06-26 Thread David Cournapeau
Hi,

I am just continuing the discussion around ABI/API, the technical side
of things that is, as this is unrelated to 1.7.x. release.

On Tue, Jun 26, 2012 at 11:41 AM, Dag Sverre Seljebotn
d.s.seljeb...@astro.uio.no wrote:
 On 06/26/2012 11:58 AM, David Cournapeau wrote:
 On Tue, Jun 26, 2012 at 10:27 AM, Dag Sverre Seljebotn
 d.s.seljeb...@astro.uio.no  wrote:
 On 06/26/2012 05:35 AM, David Cournapeau wrote:
 On Tue, Jun 26, 2012 at 4:10 AM, Ondřej Čertíkondrej.cer...@gmail.com    
 wrote:


 My understanding is that Travis is simply trying to stress We have to
 think about the implications of our changes on existing users. and
 also that little changes (with the best intentions!) that however mean
 either a breakage or confusion for users (due to historical reasons)
 should be avoided if possible. And I very strongly feel the same way.
 And I think that most people on this list do as well.

 I think Travis is more concerned about API than ABI changes (in that
 example for 1.4, the ABI breakage was caused by a change that was
 pushed by Travis IIRC).

 The relative importance of API vs ABI is a tough one: I think ABI
 breakage is as bad as API breakage (but matter in different
 circumstances), but it is hard to improve the situation around our ABI
 without changing the API (especially everything around macros and
 publicly accessible structures). Changing this is politically

 But I think it is *possible* to get to a situation where ABI isn't
 broken without changing API. I have posted such a proposal.
 If one uses the kind of C-level duck typing I describe in the link
 below, one would do

 typedef PyObject PyArrayObject;

 typedef struct {
     ...
 } NumPyArray; /* used to be PyArrayObject */

 Maybe we're just in violent agreement, but whatever ends up being used
 would require to change the *current* C API, right ? If one wants to

 Accessing arr-dims[i] directly would need to change. But that's been
 discouraged for a long time. By API I meant access through the macros.

 One of the changes under discussion here is to change PyArray_SHAPE from
 a macro that accepts both PyObject* and PyArrayObject* to a function
 that only accepts PyArrayObject* (hence breakage). I'm saying that under
 my proposal, assuming I or somebody else can find the time to implement
 it under, you can both make it a function and have it accept both
 PyObject* and PyArrayObject* (since they are the same), undoing the
 breakage but allowing to hide the ABI.

 (It doesn't give you full flexibility in ABI, it does require that you
 somewhere have an npy_intp dims[nd] with the same lifetime as your
 object, etc., but I don't consider that a big disadvantage).

 allow for changes in our structures more freely, we have to hide them
 from the headers, which means breaking the code that depends on the
 structure binary layout. Any code that access those directly will need
 to be changed.

 There is the particular issue of iterator, which seem quite difficult
 to make ABI-safe without losing significant performance.

 I don't agree (for some meanings of ABI-safe). You can export the data
 (dataptr/shape/strides) through the ABI, then the iterator uses these in
 whatever way it wishes consumer-side. Sort of like PEP 3118 without the
 performance degradation. The only sane way IMO of doing iteration is
 building it into the consumer anyway.

(I have not read the whole cython discussion yet)

What do you mean by building iteration in the consumer ? My
understanding is that any data export would be done through a level of
indirection (dataptr/shape/strides). Conceptually, I can't see how one
could keep ABI without that level of indirection without some compile.
In the case of iterator, that means multiple pointer chasing per
sample -- i.e. the tight loop issue you mentioned earlier for
PyArray_DATA is the common case for iterator.

I can only see two ways of doing fast (special casing) iteration:
compile-time special casing or runtime optimization. Compile-time
requires access to the internals (even if one were to use C++ with
advanced template magic ala STL/iterator, I don't think one can get
performance if everything is not in the headers, but maybe C++
compilers are super smart those days in ways I can't comprehend). I
would think runtime is the long-term solution, but that's far away,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] moving forward around ABI/API compatibilities (was numpy 1.7.x branch)

2012-06-26 Thread Dag Sverre Seljebotn
On 06/26/2012 01:48 PM, David Cournapeau wrote:
 Hi,

 I am just continuing the discussion around ABI/API, the technical side
 of things that is, as this is unrelated to 1.7.x. release.

 On Tue, Jun 26, 2012 at 11:41 AM, Dag Sverre Seljebotn
 d.s.seljeb...@astro.uio.no  wrote:
 On 06/26/2012 11:58 AM, David Cournapeau wrote:
 On Tue, Jun 26, 2012 at 10:27 AM, Dag Sverre Seljebotn
 d.s.seljeb...@astro.uio.nowrote:
 On 06/26/2012 05:35 AM, David Cournapeau wrote:
 On Tue, Jun 26, 2012 at 4:10 AM, Ondřej Čertíkondrej.cer...@gmail.com   
wrote:


 My understanding is that Travis is simply trying to stress We have to
 think about the implications of our changes on existing users. and
 also that little changes (with the best intentions!) that however mean
 either a breakage or confusion for users (due to historical reasons)
 should be avoided if possible. And I very strongly feel the same way.
 And I think that most people on this list do as well.

 I think Travis is more concerned about API than ABI changes (in that
 example for 1.4, the ABI breakage was caused by a change that was
 pushed by Travis IIRC).

 The relative importance of API vs ABI is a tough one: I think ABI
 breakage is as bad as API breakage (but matter in different
 circumstances), but it is hard to improve the situation around our ABI
 without changing the API (especially everything around macros and
 publicly accessible structures). Changing this is politically

 But I think it is *possible* to get to a situation where ABI isn't
 broken without changing API. I have posted such a proposal.
 If one uses the kind of C-level duck typing I describe in the link
 below, one would do

 typedef PyObject PyArrayObject;

 typedef struct {
  ...
 } NumPyArray; /* used to be PyArrayObject */

 Maybe we're just in violent agreement, but whatever ends up being used
 would require to change the *current* C API, right ? If one wants to

 Accessing arr-dims[i] directly would need to change. But that's been
 discouraged for a long time. By API I meant access through the macros.

 One of the changes under discussion here is to change PyArray_SHAPE from
 a macro that accepts both PyObject* and PyArrayObject* to a function
 that only accepts PyArrayObject* (hence breakage). I'm saying that under
 my proposal, assuming I or somebody else can find the time to implement
 it under, you can both make it a function and have it accept both
 PyObject* and PyArrayObject* (since they are the same), undoing the
 breakage but allowing to hide the ABI.

 (It doesn't give you full flexibility in ABI, it does require that you
 somewhere have an npy_intp dims[nd] with the same lifetime as your
 object, etc., but I don't consider that a big disadvantage).

 allow for changes in our structures more freely, we have to hide them
 from the headers, which means breaking the code that depends on the
 structure binary layout. Any code that access those directly will need
 to be changed.

 There is the particular issue of iterator, which seem quite difficult
 to make ABI-safe without losing significant performance.

 I don't agree (for some meanings of ABI-safe). You can export the data
 (dataptr/shape/strides) through the ABI, then the iterator uses these in
 whatever way it wishes consumer-side. Sort of like PEP 3118 without the
 performance degradation. The only sane way IMO of doing iteration is
 building it into the consumer anyway.

 (I have not read the whole cython discussion yet)

I'll try to write a summary and post it when I can get around to it.


 What do you mean by building iteration in the consumer ? My

consumer is the user of the NumPy C API. So I meant that the iteration 
logic is all in C header files and compiled again for each such 
consumer. Iterators don't cross the ABI boundary.

 understanding is that any data export would be done through a level of
 indirection (dataptr/shape/strides). Conceptually, I can't see how one
 could keep ABI without that level of indirection without some compile.
 In the case of iterator, that means multiple pointer chasing per
 sample -- i.e. the tight loop issue you mentioned earlier for
 PyArray_DATA is the common case for iterator.

Even if you do indirection, iterator utilities that are compiled in the 
consumer/user code can cache the data that's retrieved.

Iterators just do

// setup crossing ABI
npy_intp *shape = PyArray_DIMS(arr);
npy_intp *strides = PyArray_STRIDES(arr);
...
// performance-sensitive code just accesses cached pointers and don't
// cross ABI

We're probably in violent agreement and just talking past one another...?


 I can only see two ways of doing fast (special casing) iteration:
 compile-time special casing or runtime optimization. Compile-time
 requires access to the internals (even if one were to use C++ with
 advanced template magic ala STL/iterator, I don't think one can get
 performance if everything is not in the headers, but maybe C++
 compilers are super smart those days in ways I 

Re: [Numpy-discussion] moving forward around ABI/API compatibilities (was numpy 1.7.x branch)

2012-06-26 Thread Dag Sverre Seljebotn
On 06/26/2012 01:48 PM, David Cournapeau wrote:
 Hi,

 I am just continuing the discussion around ABI/API, the technical side
 of things that is, as this is unrelated to 1.7.x. release.

 On Tue, Jun 26, 2012 at 11:41 AM, Dag Sverre Seljebotn
 d.s.seljeb...@astro.uio.no  wrote:
 On 06/26/2012 11:58 AM, David Cournapeau wrote:
 On Tue, Jun 26, 2012 at 10:27 AM, Dag Sverre Seljebotn
 d.s.seljeb...@astro.uio.nowrote:
 On 06/26/2012 05:35 AM, David Cournapeau wrote:
 On Tue, Jun 26, 2012 at 4:10 AM, Ondřej Čertíkondrej.cer...@gmail.com   
wrote:


 My understanding is that Travis is simply trying to stress We have to
 think about the implications of our changes on existing users. and
 also that little changes (with the best intentions!) that however mean
 either a breakage or confusion for users (due to historical reasons)
 should be avoided if possible. And I very strongly feel the same way.
 And I think that most people on this list do as well.

 I think Travis is more concerned about API than ABI changes (in that
 example for 1.4, the ABI breakage was caused by a change that was
 pushed by Travis IIRC).

 The relative importance of API vs ABI is a tough one: I think ABI
 breakage is as bad as API breakage (but matter in different
 circumstances), but it is hard to improve the situation around our ABI
 without changing the API (especially everything around macros and
 publicly accessible structures). Changing this is politically

 But I think it is *possible* to get to a situation where ABI isn't
 broken without changing API. I have posted such a proposal.
 If one uses the kind of C-level duck typing I describe in the link
 below, one would do

 typedef PyObject PyArrayObject;

 typedef struct {
  ...
 } NumPyArray; /* used to be PyArrayObject */

 Maybe we're just in violent agreement, but whatever ends up being used
 would require to change the *current* C API, right ? If one wants to

 Accessing arr-dims[i] directly would need to change. But that's been
 discouraged for a long time. By API I meant access through the macros.

 One of the changes under discussion here is to change PyArray_SHAPE from
 a macro that accepts both PyObject* and PyArrayObject* to a function
 that only accepts PyArrayObject* (hence breakage). I'm saying that under
 my proposal, assuming I or somebody else can find the time to implement
 it under, you can both make it a function and have it accept both
 PyObject* and PyArrayObject* (since they are the same), undoing the
 breakage but allowing to hide the ABI.

 (It doesn't give you full flexibility in ABI, it does require that you
 somewhere have an npy_intp dims[nd] with the same lifetime as your
 object, etc., but I don't consider that a big disadvantage).

 allow for changes in our structures more freely, we have to hide them
 from the headers, which means breaking the code that depends on the
 structure binary layout. Any code that access those directly will need
 to be changed.

 There is the particular issue of iterator, which seem quite difficult
 to make ABI-safe without losing significant performance.

 I don't agree (for some meanings of ABI-safe). You can export the data
 (dataptr/shape/strides) through the ABI, then the iterator uses these in
 whatever way it wishes consumer-side. Sort of like PEP 3118 without the
 performance degradation. The only sane way IMO of doing iteration is
 building it into the consumer anyway.

 (I have not read the whole cython discussion yet)

So here's the summary. It's rather complicated but also incredibly neat 
:-) And technical details can be hidden behind a tight API.

  - We introduce a C-level metaclass, extensibletype, which to each 
type adds a branch-miss-free string-pointer hash table. The ndarray 
type is made an instance of this metaclass, so that you can do

PyCustomSlots_GetTable(array_object-ob_type)

  - The hash table uses a perfect hashing scheme:

   a) We take the lower 64 bits of md5 of the lookup string (this can be 
done compile-time or module-load-time) as a pre-hash h.

   b) When looking up the table for a key with pre-hash h, the index 
in the table is given by

((h  table-r)  table-m1) ^ table-d[r  table-m2]

Then, *if* the element is present, it will always be found on the first 
try; the table is guaranteed collisionless. This means that an expensive 
branch-miss can be avoided. It is really incredibly fast in practice, 
with a 0.5 ns penalty on my 1.8 GHz laptop.

The magic is in finding the right table-r and table-d. For a 64-slot 
table, parameters r and d[0]..d[63] can be found in 10us on my machine 
(it's an O(n) operation). (table-d[i] has type uint16_t)

(This algorithm was found in an academic paper which I'm too lazy to dig 
up from that thread right now; perfect hashing is an active research field.)

The result? You can use this table to store function pointers in the 
type, like C++ virtual tables or like the built-in slots like 
tp_get_buffer, but *without* having to agree on 

Re: [Numpy-discussion] moving forward around ABI/API compatibilities (was numpy 1.7.x branch)

2012-06-26 Thread Dag Sverre Seljebotn
On 06/26/2012 04:08 PM, Dag Sverre Seljebotn wrote:
 On 06/26/2012 01:48 PM, David Cournapeau wrote:
 Hi,

 I am just continuing the discussion around ABI/API, the technical side
 of things that is, as this is unrelated to 1.7.x. release.

 On Tue, Jun 26, 2012 at 11:41 AM, Dag Sverre Seljebotn
 d.s.seljeb...@astro.uio.no wrote:
 On 06/26/2012 11:58 AM, David Cournapeau wrote:
 On Tue, Jun 26, 2012 at 10:27 AM, Dag Sverre Seljebotn
 d.s.seljeb...@astro.uio.no wrote:
 On 06/26/2012 05:35 AM, David Cournapeau wrote:
 On Tue, Jun 26, 2012 at 4:10 AM, Ondřej
 Čertíkondrej.cer...@gmail.com wrote:


 My understanding is that Travis is simply trying to stress We
 have to
 think about the implications of our changes on existing users. and
 also that little changes (with the best intentions!) that however
 mean
 either a breakage or confusion for users (due to historical reasons)
 should be avoided if possible. And I very strongly feel the same
 way.
 And I think that most people on this list do as well.

 I think Travis is more concerned about API than ABI changes (in that
 example for 1.4, the ABI breakage was caused by a change that was
 pushed by Travis IIRC).

 The relative importance of API vs ABI is a tough one: I think ABI
 breakage is as bad as API breakage (but matter in different
 circumstances), but it is hard to improve the situation around our
 ABI
 without changing the API (especially everything around macros and
 publicly accessible structures). Changing this is politically

 But I think it is *possible* to get to a situation where ABI isn't
 broken without changing API. I have posted such a proposal.
 If one uses the kind of C-level duck typing I describe in the link
 below, one would do

 typedef PyObject PyArrayObject;

 typedef struct {
 ...
 } NumPyArray; /* used to be PyArrayObject */

 Maybe we're just in violent agreement, but whatever ends up being used
 would require to change the *current* C API, right ? If one wants to

 Accessing arr-dims[i] directly would need to change. But that's been
 discouraged for a long time. By API I meant access through the macros.

 One of the changes under discussion here is to change PyArray_SHAPE from
 a macro that accepts both PyObject* and PyArrayObject* to a function
 that only accepts PyArrayObject* (hence breakage). I'm saying that under
 my proposal, assuming I or somebody else can find the time to implement
 it under, you can both make it a function and have it accept both
 PyObject* and PyArrayObject* (since they are the same), undoing the
 breakage but allowing to hide the ABI.

 (It doesn't give you full flexibility in ABI, it does require that you
 somewhere have an npy_intp dims[nd] with the same lifetime as your
 object, etc., but I don't consider that a big disadvantage).

 allow for changes in our structures more freely, we have to hide them
 from the headers, which means breaking the code that depends on the
 structure binary layout. Any code that access those directly will need
 to be changed.

 There is the particular issue of iterator, which seem quite difficult
 to make ABI-safe without losing significant performance.

 I don't agree (for some meanings of ABI-safe). You can export the data
 (dataptr/shape/strides) through the ABI, then the iterator uses these in
 whatever way it wishes consumer-side. Sort of like PEP 3118 without the
 performance degradation. The only sane way IMO of doing iteration is
 building it into the consumer anyway.

 (I have not read the whole cython discussion yet)

 So here's the summary. It's rather complicated but also incredibly neat
 :-) And technical details can be hidden behind a tight API.

 - We introduce a C-level metaclass, extensibletype, which to each type
 adds a branch-miss-free string-pointer hash table. The ndarray type is
 made an instance of this metaclass, so that you can do

 PyCustomSlots_GetTable(array_object-ob_type)

 - The hash table uses a perfect hashing scheme:

 a) We take the lower 64 bits of md5 of the lookup string (this can be
 done compile-time or module-load-time) as a pre-hash h.

 b) When looking up the table for a key with pre-hash h, the index in
 the table is given by

 ((h  table-r)  table-m1) ^ table-d[r  table-m2]

Sorry, typo. Should be

((h  table-r)  table-m1) ^ table-d[h  table-m2]

What happens is that h  table-m2 sorts the keys of the table into n 
buckets. Then r is selected (among 64 possible choices) so that 
there's no intra-bucket collisions. Finally, d is chosen so that none of 
the buckets collide, starting with the largest one.

Dag



 Then, *if* the element is present, it will always be found on the first
 try; the table is guaranteed collisionless. This means that an expensive
 branch-miss can be avoided. It is really incredibly fast in practice,
 with a 0.5 ns penalty on my 1.8 GHz laptop.

 The magic is in finding the right table-r and table-d. For a 64-slot
 table, parameters r and d[0]..d[63] can be found in 10us on my machine
 (it's an O(n) 

Re: [Numpy-discussion] moving forward around ABI/API compatibilities (was numpy 1.7.x branch)

2012-06-26 Thread David Cournapeau
On Tue, Jun 26, 2012 at 2:40 PM, Dag Sverre Seljebotn
d.s.seljeb...@astro.uio.no wrote:
 On 06/26/2012 01:48 PM, David Cournapeau wrote:
 Hi,

 I am just continuing the discussion around ABI/API, the technical side
 of things that is, as this is unrelated to 1.7.x. release.

 On Tue, Jun 26, 2012 at 11:41 AM, Dag Sverre Seljebotn
 d.s.seljeb...@astro.uio.no  wrote:
 On 06/26/2012 11:58 AM, David Cournapeau wrote:
 On Tue, Jun 26, 2012 at 10:27 AM, Dag Sverre Seljebotn
 d.s.seljeb...@astro.uio.no    wrote:
 On 06/26/2012 05:35 AM, David Cournapeau wrote:
 On Tue, Jun 26, 2012 at 4:10 AM, Ondřej Čertíkondrej.cer...@gmail.com  
     wrote:


 My understanding is that Travis is simply trying to stress We have to
 think about the implications of our changes on existing users. and
 also that little changes (with the best intentions!) that however mean
 either a breakage or confusion for users (due to historical reasons)
 should be avoided if possible. And I very strongly feel the same way.
 And I think that most people on this list do as well.

 I think Travis is more concerned about API than ABI changes (in that
 example for 1.4, the ABI breakage was caused by a change that was
 pushed by Travis IIRC).

 The relative importance of API vs ABI is a tough one: I think ABI
 breakage is as bad as API breakage (but matter in different
 circumstances), but it is hard to improve the situation around our ABI
 without changing the API (especially everything around macros and
 publicly accessible structures). Changing this is politically

 But I think it is *possible* to get to a situation where ABI isn't
 broken without changing API. I have posted such a proposal.
 If one uses the kind of C-level duck typing I describe in the link
 below, one would do

 typedef PyObject PyArrayObject;

 typedef struct {
      ...
 } NumPyArray; /* used to be PyArrayObject */

 Maybe we're just in violent agreement, but whatever ends up being used
 would require to change the *current* C API, right ? If one wants to

 Accessing arr-dims[i] directly would need to change. But that's been
 discouraged for a long time. By API I meant access through the macros.

 One of the changes under discussion here is to change PyArray_SHAPE from
 a macro that accepts both PyObject* and PyArrayObject* to a function
 that only accepts PyArrayObject* (hence breakage). I'm saying that under
 my proposal, assuming I or somebody else can find the time to implement
 it under, you can both make it a function and have it accept both
 PyObject* and PyArrayObject* (since they are the same), undoing the
 breakage but allowing to hide the ABI.

 (It doesn't give you full flexibility in ABI, it does require that you
 somewhere have an npy_intp dims[nd] with the same lifetime as your
 object, etc., but I don't consider that a big disadvantage).

 allow for changes in our structures more freely, we have to hide them
 from the headers, which means breaking the code that depends on the
 structure binary layout. Any code that access those directly will need
 to be changed.

 There is the particular issue of iterator, which seem quite difficult
 to make ABI-safe without losing significant performance.

 I don't agree (for some meanings of ABI-safe). You can export the data
 (dataptr/shape/strides) through the ABI, then the iterator uses these in
 whatever way it wishes consumer-side. Sort of like PEP 3118 without the
 performance degradation. The only sane way IMO of doing iteration is
 building it into the consumer anyway.

 (I have not read the whole cython discussion yet)

 I'll try to write a summary and post it when I can get around to it.


 What do you mean by building iteration in the consumer ? My

 consumer is the user of the NumPy C API. So I meant that the iteration
 logic is all in C header files and compiled again for each such
 consumer. Iterators don't cross the ABI boundary.

 understanding is that any data export would be done through a level of
 indirection (dataptr/shape/strides). Conceptually, I can't see how one
 could keep ABI without that level of indirection without some compile.
 In the case of iterator, that means multiple pointer chasing per
 sample -- i.e. the tight loop issue you mentioned earlier for
 PyArray_DATA is the common case for iterator.

 Even if you do indirection, iterator utilities that are compiled in the
 consumer/user code can cache the data that's retrieved.

 Iterators just do

 // setup crossing ABI
 npy_intp *shape = PyArray_DIMS(arr);
 npy_intp *strides = PyArray_STRIDES(arr);
 ...
 // performance-sensitive code just accesses cached pointers and don't
 // cross ABI

The problem is that iterators need more that this. But thinking more
about it, I am not so dead sure we could not get there. I will need to
play with some code.


 Going slightly OT, then IMO, the *only* long-term solution in 2012 is
 LLVM. That allows you to do any level of inlining and special casing and
 optimization at run-time, which is the only way of 

Re: [Numpy-discussion] moving forward around ABI/API compatibilities (was numpy 1.7.x branch)

2012-06-26 Thread Travis Oliphant
 
 (I have not read the whole cython discussion yet)
 
 So here's the summary. It's rather complicated but also incredibly neat 
 :-) And technical details can be hidden behind a tight API.

Could you provide a bit more context for this list. I think this is an 
important technology concept.   I'd like to understand better how well it jives 
with Numba-produced APIs and how we can make use of it in NumPy.   

Where exactly would this be used in the NumPy API?  What would it replace? 

 
  - We introduce a C-level metaclass, extensibletype, which to each 
 type adds a branch-miss-free string-pointer hash table. The ndarray 
 type is made an instance of this metaclass, so that you can do
 
 PyCustomSlots_GetTable(array_object-ob_type)
 
  - The hash table uses a perfect hashing scheme:
 
   a) We take the lower 64 bits of md5 of the lookup string (this can be 
 done compile-time or module-load-time) as a pre-hash h.
 
   b) When looking up the table for a key with pre-hash h, the index 
 in the table is given by
 
 ((h  table-r)  table-m1) ^ table-d[r  table-m2]
 
 Then, *if* the element is present, it will always be found on the first 
 try; the table is guaranteed collisionless. This means that an expensive 
 branch-miss can be avoided. It is really incredibly fast in practice, 
 with a 0.5 ns penalty on my 1.8 GHz laptop.
 
 The magic is in finding the right table-r and table-d. For a 64-slot 
 table, parameters r and d[0]..d[63] can be found in 10us on my machine 
 (it's an O(n) operation). (table-d[i] has type uint16_t)
 
 (This algorithm was found in an academic paper which I'm too lazy to dig 
 up from that thread right now; perfect hashing is an active research field.)
 
 The result? You can use this table to store function pointers in the 
 type, like C++ virtual tables or like the built-in slots like 
 tp_get_buffer, but *without* having to agree on everything at 
 compile-time like in C++. And the only penalty is ~0.5 ns per call and 
 some cache usage.
 
 Cython would use this to replace the current custom cdef class vtable 
 with something more tools could agree on, e.g. store function pointers 
 in the table with keys like
 
 method:foo:i4i8-f4
 
 But NumPy could easily store entries relating to its C API in the same 
 hash table,
 
 numpy:SHAPE
 
 Then, the C API functions would all take PyObject*, look up the fast 
 hash table on the ob_type.
 
 This allows for incredibly flexible duck typing on the C level.

This does sound very nice.   

 
 PyArray_Check would just check for the presence of the C API but not 
 care about the actual Python type, i.e., no PyObject_TypeCheck.
 
 Me and Robert have talked a lot about this and will move forward with it 
 for Cython. Obviously I don't expect others than me to pick it up for 
 NumPy so we'll see... I'll write up a specification document sometimes 
 over the next couple of weeks as we need that even if only for Cython.

We will look forward to what you come up with. 

Best regards,

-Travis

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] moving forward around ABI/API compatibilities (was numpy 1.7.x branch)

2012-06-26 Thread Dag Sverre Seljebotn
On 06/26/2012 05:02 PM, Travis Oliphant wrote:

 (I have not read the whole cython discussion yet)

 So here's the summary. It's rather complicated but also incredibly neat
 :-) And technical details can be hidden behind a tight API.

 Could you provide a bit more context for this list. I think this is an 
 important technology concept.   I'd like to understand better how well it 
 jives with Numba-produced APIs and how we can make use of it in NumPy.

 Where exactly would this be used in the NumPy API?  What would it replace?

Right. I thought I did that :-) I realize I might sometimes be too 
brief, part of the problem is I'm used to Cython development where I 
can start a sentence and then Mark Florisson or Robert Bradshaw can 
finish it.

I'll try to step through how PyArray_DIMS could work under a refactored 
API from a C client. To do this I gloss over some of the finer points 
etc. and just make a premature decision here and there. Almost none of 
the types or functions below already exists, I'll assume we implement 
them (I do have a good start on the reference implementation).

We'll add a new C-level slot called numpy:SHAPE to the ndarray type, 
and hook the PyArray_DIMS to use this slot.

Inside NumPy


The PyArray_Type (?) definition changes from being a PyTypeObject to a 
PyExtensibleTypeObject, and PyExtensibleType_Ready is called instead of 
PyType_Ready. This builds the perfect lookup table etc. I'll omit the 
details.

The caller
--

First we need some macro module initialization setup (part of NumPy 
include files):

/* lower-64-bits of md5 of numpy:SHAPE */
#define NPY_SHAPE_SLOT_PREHASH 0xa8cf70dc5f598f40ULL
/* hold an interned numpy:SHAPE string */
static char *_Npy_interned_numpy_SHAPE;

Then initialize interned key in import_array():

... import_array(...)
{
...
PyCustomSlotsInternerContext interner = PyCustomSlots_GetInterner();
_Npy_interned_numpy_SHAPE = PyCustomSlots_InternLiteral(numpy:SHAPE);
...
}

Then, let's get rid of that PyArrayObject (in the *API*; of course 
there's still some struct representing the NumPy array internally but 
its layout is no longer exposed anywhere). That means always using 
PyObject, just like the Python API does, e.g., PyDict_GetItem gets a 
PyObject even if it must be a dict. But for backwards compatability, 
let's throw in:

typedef PyObject PyArrayObject;

Now, change PyArray_Check a bit (likely/unlikely indicates branch hints, 
e.g. __builtin_expect in gcc). Some context:

typedef struct {
   char *interned_key;
   uintptr_t flags;
   void *funcptr;
} PyCustomSlot;

Then:

static inline int PyArray_Check(PyObject *arr) {
 /* it is an array if it has the numpy:SHAPE slot
This is a bad choice of test but for simplicity... */
   if (likely(PyCustomSlots_Check(arr-ob_type)) {
 PyCustomSlot *slot;
 slot = PyCustomSlots_Find(arr-ob_type,
 NPY_SHAPE_SLOT_PREHASH, _Npy_interned_numpy_SHAPE)
 if (likely(slot != NULL)) return 1;
   }
   return 0;
}

Finally, we can write our new PyArray_DIMS:

static inline npy_intp *PyArray_DIMS(PyObject *arr) {
 PyCustomSlot *slot = PyCustomSlots_FindAssumePresent(arr-tp_base,
 NPY_SHAPE_SLOT_PREHASH);
 return (*slot-funcptr)(arr);
}

What goes on here is:

  - PyCustomSlots_Check checks whether the metaclass 
(arr-ob_type-tp_base) is the PyExtensibleType_Type, which is a class 
we agree upon by SEP

  - PyCustomSlots_Find takes the prehash of the key which through the 
parametrized hash function gives the position in the hash table. At that 
position in the PyCustomSlot array, one either finds the element (by 
comparing the interned key by pointer value), or the element is not in 
the table (so no loops or branch misses).

  - Finally, inside PyArray_DIMS we assume that PyArray_Check has 
already been called. Thus, since we know the slot is in the table, we 
can skip even the check and shave off a nanosecond.

What is replaced


Largely the macros and existing function pointers imported by 
import_array. However, some of the functions (in particular constructors 
etc.) would work just like before. Only OOP methods change their 
behaviour.

Compared to the macros, there should be ~4-7 ns penalty per call on my 
computer (1.9 GHz). However, compared to making PyArray_SHAPE a function 
going through the import_array function table, the cost is only a couple 
of ns.

 Me and Robert have talked a lot about this and will move forward with it
 for Cython. Obviously I don't expect others than me to pick it up for
 NumPy so we'll see... I'll write up a specification document sometimes
 over the next couple of weeks as we need that even if only for Cython.

 We will look forward to what you come up with.

Will keep you posted,

Dag
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] moving forward around ABI/API compatibilities (was numpy 1.7.x branch)

2012-06-26 Thread Dag Sverre Seljebotn
On 06/26/2012 10:35 PM, Dag Sverre Seljebotn wrote:
 On 06/26/2012 05:02 PM, Travis Oliphant wrote:

 (I have not read the whole cython discussion yet)

 So here's the summary. It's rather complicated but also incredibly neat
 :-) And technical details can be hidden behind a tight API.

 Could you provide a bit more context for this list. I think this is an 
 important technology concept.   I'd like to understand better how well it 
 jives with Numba-produced APIs and how we can make use of it in NumPy.

 Where exactly would this be used in the NumPy API?  What would it replace?

 Right. I thought I did that :-) I realize I might sometimes be too
 brief, part of the problem is I'm used to Cython development where I
 can start a sentence and then Mark Florisson or Robert Bradshaw can
 finish it.

 I'll try to step through how PyArray_DIMS could work under a refactored
 API from a C client. To do this I gloss over some of the finer points
 etc. and just make a premature decision here and there. Almost none of
 the types or functions below already exists, I'll assume we implement
 them (I do have a good start on the reference implementation).

 We'll add a new C-level slot called numpy:SHAPE to the ndarray type,
 and hook the PyArray_DIMS to use this slot.

 Inside NumPy
 

 The PyArray_Type (?) definition changes from being a PyTypeObject to a
 PyExtensibleTypeObject, and PyExtensibleType_Ready is called instead of
 PyType_Ready. This builds the perfect lookup table etc. I'll omit the
 details.

 The caller
 --

 First we need some macro module initialization setup (part of NumPy
 include files):

 /* lower-64-bits of md5 of numpy:SHAPE */
 #define NPY_SHAPE_SLOT_PREHASH 0xa8cf70dc5f598f40ULL
 /* hold an interned numpy:SHAPE string */
 static char *_Npy_interned_numpy_SHAPE;

 Then initialize interned key in import_array():

 ... import_array(...)
 {
 ...
 PyCustomSlotsInternerContext interner = PyCustomSlots_GetInterner();
 _Npy_interned_numpy_SHAPE = PyCustomSlots_InternLiteral(numpy:SHAPE);
 ...
 }

 Then, let's get rid of that PyArrayObject (in the *API*; of course
 there's still some struct representing the NumPy array internally but
 its layout is no longer exposed anywhere). That means always using
 PyObject, just like the Python API does, e.g., PyDict_GetItem gets a
 PyObject even if it must be a dict. But for backwards compatability,
 let's throw in:

 typedef PyObject PyArrayObject;

 Now, change PyArray_Check a bit (likely/unlikely indicates branch hints,
 e.g. __builtin_expect in gcc). Some context:

 typedef struct {
 char *interned_key;
 uintptr_t flags;
 void *funcptr;
 } PyCustomSlot;

 Then:

 static inline int PyArray_Check(PyObject *arr) {
   /* it is an array if it has the numpy:SHAPE slot
  This is a bad choice of test but for simplicity... */
 if (likely(PyCustomSlots_Check(arr-ob_type)) {
   PyCustomSlot *slot;
   slot = PyCustomSlots_Find(arr-ob_type,
   NPY_SHAPE_SLOT_PREHASH, _Npy_interned_numpy_SHAPE)
   if (likely(slot != NULL)) return 1;
 }
 return 0;
 }

 Finally, we can write our new PyArray_DIMS:


First bug report:

 static inline npy_intp *PyArray_DIMS(PyObject *arr) {
   PyCustomSlot *slot = PyCustomSlots_FindAssumePresent(arr-tp_base,
   NPY_SHAPE_SLOT_PREHASH);
   return (*slot-funcptr)(arr);

last line should be

npy_intp *(*func)(PyObject*);
func = slot-funcptr; /* tbd throw in cast for C++ */
return (*func)(arr);

Dag

 }

 What goes on here is:

- PyCustomSlots_Check checks whether the metaclass
 (arr-ob_type-tp_base) is the PyExtensibleType_Type, which is a class
 we agree upon by SEP

- PyCustomSlots_Find takes the prehash of the key which through the
 parametrized hash function gives the position in the hash table. At that
 position in the PyCustomSlot array, one either finds the element (by
 comparing the interned key by pointer value), or the element is not in
 the table (so no loops or branch misses).

- Finally, inside PyArray_DIMS we assume that PyArray_Check has
 already been called. Thus, since we know the slot is in the table, we
 can skip even the check and shave off a nanosecond.

 What is replaced
 

 Largely the macros and existing function pointers imported by
 import_array. However, some of the functions (in particular constructors
 etc.) would work just like before. Only OOP methods change their
 behaviour.

 Compared to the macros, there should be ~4-7 ns penalty per call on my
 computer (1.9 GHz). However, compared to making PyArray_SHAPE a function
 going through the import_array function table, the cost is only a couple
 of ns.

 Me and Robert have talked a lot about this and will move forward with it
 for Cython. Obviously I don't expect others than me to pick it up for
 NumPy so we'll see... I'll write up a specification document sometimes
 over the next couple of weeks as we need that even if only for Cython.

 We will