So, it took me all of 20 minutes to pull dgemm into J for a matrix
multiplication speedup.
I stuck it here, along with an org-emacs TODO list for making this
actually happen.
It's all "busy work" as far as I can tell, though it would be my first
time writing code that links to CUDA.
Either way, the dgemm wrapper should eventually make its way into the
API stuff, as it's a pretty good speedup over +/ .* for bigger array
problems
https://github.com/locklin/jCUDA
Feel free to pitch in on the busy work if anyone has problems that would
benefit from this.
-Scott
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm