On 05/11/2012 03:37 PM, mark florisson wrote: > On 11 May 2012 12:13, Dag Sverre Seljebotn<d.s.seljeb...@astro.uio.no> wrote: >> (NumPy devs: I know, I get too many ideas. But this time I *really* believe >> in it, I think this is going to be *huge*. And if Mark F. likes it it's not >> going to be without manpower; and as his mentor I'd pitch in too here and >> there.) >> >> (Mark F.: I believe this is *very* relevant to your GSoC. I certainly don't >> want to micro-manage your GSoC, just have your take.) >> >> Travis, thank you very much for those good words in the "NA-mask >> interactions..." thread. It put most of my concerns away. If anybody is >> leaning towards for opaqueness because of its OOP purity, I want to refer to >> C++ and its walled-garden of ideological purity -- it has, what, 3-4 >> different OOP array libraries, neither of which is able to out-compete the >> other. Meanwhile the rest of the world happily cooperates using pointers, >> strides, CSR and CSC. >> >> Now, there are limits to what you can do with strides and pointers. Noone's >> denying the need for more. In my mind that's an API where you can do >> fetch_block and put_block of cache-sized, N-dimensional blocks on an array; >> but it might be something slightly different. >> >> Here's what I'm asking: DO NOT simply keep extending ndarray and the NumPy C >> API to deal with this issue. >> >> What we need is duck-typing/polymorphism at the C level. If you keep >> extending ndarray and the NumPy C API, what we'll have is a one-to-many >> relationship: One provider of array technology, multiple consumers (with >> hooks, I'm sure, but all implementations of the hook concept in the NumPy >> world I've seen so far are a total disaster!). >> >> What I think we need instead is something like PEP 3118 for the "abstract" >> array that is only available block-wise with getters and setters. On the >> Cython list we've decided that what we want for CEP 1000 (for boxing >> callbacks etc.) is to extend PyTypeObject with our own fields; we could >> create CEP 1001 to solve this issue and make any Python object an exporter >> of "block-getter/setter-arrays" (better name needed). >> >> What would be exported is (of course) a simple vtable: >> >> typedef struct { >> int (*get_block)(void *ctx, ssize_t *upper_left, ssize_t *lower_right, >> ...); >> ... >> } block_getter_setter_array_vtable; >> >> Let's please discuss the details *after* the fundamentals. But the reason I >> put void* there instead of PyObject* is that I hope this could be used >> beyond the Python world (say, Python<->Julia); the void* would be handed to >> you at the time you receive the vtable (however we handle that). > > I suppose it would also be useful to have some way of predicting the > output format polymorphically for the caller. E.g. dense * > block_diagonal results in block diagonal, but dense + block_diagonal > results in dense, etc. It might be useful for the caller to know > whether it needs to allocate a sparse, dense or block-structured > array. Or maybe the polymorphic function could even do the allocation. > This needs to happen recursively of course, to avoid intermediate > temporaries. The compiler could easily handle that, and so could numpy > when it gets lazy evaluation.
Ah. But that depends too on the computation to be performed too; a) elementwise, b) axis-wise reductions, c) linear algebra... In my oomatrix code (please don't look at it, it's shameful) I do this using multiple dispatch. I'd rather ignore this for as long as we can, only implementing "a[:] = ..." -- I can't see how decisions here would trickle down to the API that's used in the kernel, it's more like a pre-phase, and better treated orthogonally. > I think if the heavy lifting of allocating output arrays and exporting > these arrays work in numpy, then support in Cython could use that (I > can already hear certain people object to more complicated array stuff > in Cython :). Even better here would be an external project that each > our projects could use (I still think the nditer sorting functionality > of arrays should be numpy-agnostic and externally available). I agree with the separate project idea. It's trivial for NumPy to incorporate that as one of its methods for exporting arrays, and I don't think it makes sense to either build it into Cython, or outright depend on NumPy. Here's what I'd like (working title: NumBridge?). - Mission: Be the "double* + shape + strides" in a world where that is no longer enough, by providing tight, focused APIs/ABIs that are usable across C/Fortran/Python. I basically want something I can quickly acquire from a NumPy array, then pass it into my C code without dragging along all the cruft that I don't need. - Written in pure C + specs, usable without Python - PEP 3118 "done right", basically semi-standardize the internal Cython memoryview ABI and get something that's passable on stack - Get block get/put API - Iterator APIs - Utility code for exporters and clients (iteration code, axis reordering, etc.) Is the scope of that insane, or is it at least worth a shot to see how bad it is? Beyond figuring out a small subset that can be done first, and whether performance considerations must be taken or not, there's two complicating factors: Pluggable dtypes, memory management. Perhaps you could come to Oslo for a couple of days to brainstorm... Dag _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion