> Now linear algebra or FFTs on a GPU would probably be a huge boon, > I'll admit - especially if it's in the form of a drop-in replacement > for the numpy or scipy versions.
NumPy generate temporary arrays for expressions involving ndarrays. This extra allocation and copying often takes more time than the computation. With GPGPUs, we have to bus the data to and from VRAM as well. D. Knuth quoted Hoare saying that "premature optimization is the root of all evil." Optimizing computation when the bottleneck is memory is premature. In order to improve on this, I think we have to add "lazy evaluation" to NumPy. That is, an operator should not return a temporary array but a symbolic expression. So if we have an expression like y = a*x + b it should not evalute a*x into a temporary array. Rather, the operators would build up a "parse tree" like y = add(multiply(a,x),b) and evalute the whole expression later on. This would require two things: First we need "dynamic code generation", which incidentally is what OpenCL is all about. I.e. OpenCL is dynamically invoked compiler; there is a function clCreateProgramFromSource, which does just what it says. Second, we need arrays to be immutable. This is very important. If arrays are not immutable, code like this could fail: y = a*x + b x[0] = 1235512371235 With lazy evaluation, the memory overhead would be much smaller. The GPGPU would also get a more complex expressions to use as a kernels. There should be an option of running this on the CPU, possibly using OpenMP for multi-threading. We could either depend on a compiler (C or Fortran) being installed, or use opcodes for a dedicated virtual machine (cf. what numexpr does). In order to reduce the effect of immutable arrays, we could introduce a context-manager. Inside the with statement, all arrays would be immutable. Second, the __exit__ method could trigger the code generator and do all the evaluation. So we would get something like this: # normal numpy here with numpy.accelerator(): # arrays become immutable # lazy evaluation # code generation and evaluation on exit # normal numpy continues here Thus, here is my plan: 1. a special context-manager class 2. immutable arrays inside with statement 3. lazy evaluation: expressions build up a parse tree 4. dynamic code generation 5. evaluation on exit I guess it is possibly to find ways to speed up this as well. If a context manager would always generate the same OpenCL code, the with statement would only need to execute once (we could raise an exception on enter to jump directly to exit). It is possibly to create a superfast NumPy. But just plugging GPGPUs into the current design would be premature. In NumPy's current state, with mutable ndarrays and operators generating temporary arrays, there is not much to gain from introducing GPGPUs. It would only be beneficial in computationally demanding parts like FFTs and solvers for linear algebra and differential equations. Ufuncs with trancendental functions might also benefit. SciPy would certainly benefit more from GPGPUs than NumPy. Just my five cents :-) Regards, Sturla Molden _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion