Re: [Numpy-discussion] Generator arrays

Travis Oliphant Thu, 27 Jan 2011 22:37:29 -0800

> 
> What happens to the buffer API/persistence with all those additions?


I understand the desire to keep things simple, which is why I am only proposing 
a rather small change to the array object with *huge* implications --- 
encompassing the very cool deferred arrays that Mark Wiebe is proposing.    As 
Einstein said,  "everything should be as simple as possilbe, *but not 
simpler*".    

While now arrays have a data-pointer that always points to memory and an 
accompanying strides array, all I'm suggesting is that they allow for 
"indirect" or "computed arrays" in a fairly simple, but general-purpose way.   
Generators have been such a huge feature in Python, I really think we need to 
figure out how to have "generated arrays" in NumPy as well --- and it turns out 
to have huge features that right now are difficult with NumPy (including 
deferred evaluation). 

I guess it's debatable how complex the array object is.  I actually see the 
array object itself as quite simple even with the changes.   What is 
complicated is how calculations are done and scattered in an ad hoc fashion 
between ufuncs and other array functions.   I like the idea of unifying the 
calculation framework using ideas like Mark's iterators and the generic 
functions that were added earlier to ufuncs.   I don't like the data-types 
holding on to the "calculation structures".  I think all calculations in NumPy 
should fit under a common rubric.    To me this would be an important part of 
any change.  

Obviously the buffer API could only be implemented for MEMORY arrays (other 
arrays would raise an error).    What to do with persistence is a good 
question, but resolvable I think.   Initially, I would also raise an error for 
trying to pickle arrays that are not MEMORY arrays --- simply calling "copy" on 
an array gives you something that can be persisted. 

Having this kind of functionality on the base NumPy object would be 
transformational for NumPy use.    Yes, you could do similar things with other 
approaches, but there is a lot of benefit of having a powerful fundamental 
object that is a shared-place to mange the expression of data calculations.  

Another approach is to introduce another object as you suggest which is the 
"generator array".   This could work, especially if there were hooks in the 
calculation engine that allowed it to be produced by array operations (say in 
an appropriate context as described before).    My main conerns are that in 
practice having a whole slew of different "array objects" (i.e. masked arrays, 
data arrays, labeled arrays, etc.) tends to cause code to be much bulkier to 
read in-practice (as you are doing a lot of conversions back and forth to take 
advantage of APIs that require one array or another. 

Having code that is written to a single object is unifying and really assists 
with code re-use and code readability.  One of the things I see happening is a 
tool like Cython being used to generate the call-graphs or read-write functions 
that are being proposed. 

I could be convinced, though, that leaving array objects alone and creating a 
better calculation object (i.e. something like an array vector machine) 
embracing and extending ufuncs is a better way to go.  But, I haven't seen that 
proposal. 

-Travis


_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Generator arrays

Reply via email to