On Thu, Aug 6, 2009 at 1:19 PM, Charles R Harris<charlesr.har...@gmail.com> wrote: > I almost looks like you are reimplementing numpy, in c++ no less. Is there > any reason why you aren't working with a numpy branch and just adding > ufuncs?
I don't know how that would work. The Ufuncs need a datatype to work with, and AFAIK, it would break everything if a numpy ndarray pointed to memory on the GPU. Could you explain what you mean a little more? > I'm also curious if you have thoughts about how to use the GPU > pipelines in parallel. Current thinking for ufunc type computations: 1) divide up the tensors into subtensors whose dimensions have power-of-two sizes (this permits a fast integer -> ndarray coordinate computation using bit shifting), 2) launch a kernel for each subtensor in it's own stream to use parallel pipelines. 3) sync and return. This is a pain to do without automatic code generation though. Currently we're using macros, but that's not pretty. C++ has templates, which we don't really use yet, but were planning on using. These have some power to generate code. The 'theano' project (www.pylearn.org/theano) for which cuda-ndarray was created has a more powerful code generation mechanism similar to weave. This algorithm is used in theano-cuda-ndarray. Scipy.weave could be very useful for generating code for specific shapes/ndims on demand, if weave could use nvcc. James _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion