Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Rohit Garg
You do realize that the throughput from onboard (video) RAM is going to be much higher, right? It's not just the parallelization but the memory bandwidth. And as James pointed out, if you can keep most of your intermediate computation on-card, you stand to benefit immensely, even if doing

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Francesc Alted
A Thursday 10 September 2009 09:45:29 Rohit Garg escrigué: You do realize that the throughput from onboard (video) RAM is going to be much higher, right? It's not just the parallelization but the memory bandwidth. And as James pointed out, if you can keep most of your intermediate

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Rohit Garg
Where are you getting this info from? IMO the technology of memory in graphics boards cannot be so different than in commercial motherboards. It could be a *bit* faster (at the expenses of packing less of it), but I'd say not as much as 4x faster (100 GB/s vs 25 GB/s of Intel i7 in sequential

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Citi, Luca
Hi Sturla, The proper way to speed up dot(a*b+c*sqrt(d), e) is to get rid of temporary intermediates. I implemented a patch http://projects.scipy.org/numpy/ticket/1153 that reduces the number of temporary intermediates. In your example from 4 to 2. There is a big improvement in terms of

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Sturla Molden
Citi, Luca skrev: That is exactly why numexpr is faster in these cases. I hope one day numpy will be able to perform such optimizations. I think it is going to require lazy evaluation. Whenever possible, an operator would just return a symbolic representation of the operation. This would

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Sturla Molden
Rohit Garg skrev: gtx280--141GBps--has 1GB ati4870--115GBps--has 1GB ati5870--153GBps (launches sept 22, 2009)--2GB models will be there too That is going to help if buffers are kept in graphics memory. But the problem is that graphics memory is a scarse resource. S.M.

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Francesc Alted
A Thursday 10 September 2009 11:11:22 Sturla Molden escrigué: Citi, Luca skrev: That is exactly why numexpr is faster in these cases. I hope one day numpy will be able to perform such optimizations. I think it is going to require lazy evaluation. Whenever possible, an operator would just

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Gael Varoquaux
On Thu, Sep 10, 2009 at 10:36:27AM +0200, Francesc Alted wrote: Where are you getting this info from? IMO the technology of memory in graphics boards cannot be so different than in commercial motherboards. It could be a *bit* faster (at the expenses of packing less of it), but I'd

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Francesc Alted
A Thursday 10 September 2009 10:58:13 Rohit Garg escrigué: Where are you getting this info from? IMO the technology of memory in graphics boards cannot be so different than in commercial motherboards. It could be a *bit* faster (at the expenses of packing less of it), but I'd say not as

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Francesc Alted
A Thursday 10 September 2009 11:20:21 Gael Varoquaux escrigué: On Thu, Sep 10, 2009 at 10:36:27AM +0200, Francesc Alted wrote: Where are you getting this info from? IMO the technology of memory in graphics boards cannot be so different than in commercial motherboards. It could be a

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Gael Varoquaux
On Thu, Sep 10, 2009 at 11:29:49AM +0200, Francesc Alted wrote: The point is: are GPUs prepared to compete with a general-purpose CPUs in all-road operations, like evaluating transcendental functions, conditionals all of this with a rich set of data types? I would like to believe

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Matthieu Brucher
Sure. Specially because NumPy is all about embarrasingly parallel problems (after all, this is how an ufunc works, doing operations element-by-element). The point is: are GPUs prepared to compete with a general-purpose CPUs in all-road operations, like evaluating transcendental functions,

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Rohit Garg
The point is: are GPUs prepared to compete with a general-purpose CPUs in all-road operations, like evaluating transcendental functions, conditionals all of this with a rich set of data types? Yup. -- Rohit Garg http://rpg-314.blogspot.com/ Senior Undergraduate Department of Physics Indian

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Francesc Alted
A Thursday 10 September 2009 11:40:48 Sturla Molden escrigué: Francesc Alted skrev: Numexpr already uses the Python parser, instead of build a new one. However the bytecode emitted after the compilation process is different, of course. Also, I don't see the point in requiring immutable

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Francesc Alted
A Thursday 10 September 2009 11:37:24 Gael Varoquaux escrigué: On Thu, Sep 10, 2009 at 11:29:49AM +0200, Francesc Alted wrote: The point is: are GPUs prepared to compete with a general-purpose CPUs in all-road operations, like evaluating transcendental functions, conditionals all of this

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Rohit Garg
a = np.cos(b) where b is a 1x1 matrix is *very* embarrassing (in the parallel meaning of the term ;-) On this operation, gpu's will eat up cpu's like a pack of pirhanas. :) -- Rohit Garg http://rpg-314.blogspot.com/ Senior Undergraduate Department of Physics Indian Institute of

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Rohit Garg
That's nice to see. I think I'll change my mind if someone could perform a vector-vector multiplication (a operation that is typically memory-bounded) You mean a dot product? -- Rohit Garg http://rpg-314.blogspot.com/ Senior Undergraduate Department of Physics Indian Institute of Technology

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Francesc Alted
A Thursday 10 September 2009 14:36:16 Rohit Garg escrigué: That's nice to see. I think I'll change my mind if someone could perform a vector-vector multiplication (a operation that is typically memory-bounded) You mean a dot product? Whatever, dot product or element-wise product. Both

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Bruce Southey
On 09/10/2009 07:40 AM, Francesc Alted wrote: A Thursday 10 September 2009 14:36:16 Rohit Garg escrigué: That's nice to see. I think I'll change my mind if someone could perform a vector-vector multiplication (a operation that is typically memory-bounded) You mean a dot product?

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Rohit Garg
Apart from float and double, which floating point formats are supported by numpy? On Thu, Sep 10, 2009 at 7:09 PM, Bruce Southey bsout...@gmail.com wrote: On 09/10/2009 07:40 AM, Francesc Alted wrote: A Thursday 10 September 2009 14:36:16 Rohit Garg escrigué: That's nice to see. I think

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Francesc Alted
A Thursday 10 September 2009 15:51:15 Rohit Garg escrigué: Apart from float and double, which floating point formats are supported by numpy? I think whatever supported by the underlying CPU, whenever it is extended double precision (12 bytes) or quad precision (16 bytes). -- Francesc Alted

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Rohit Garg
I think whatever supported by the underlying CPU, whenever it is extended double precision (12 bytes) or quad precision (16 bytes). classic 64 bit cpu's support neither. -- Francesc Alted ___ NumPy-Discussion mailing list

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Robert Kern
On Thu, Sep 10, 2009 at 07:28, Francesc Altedfal...@pytables.org wrote: A Thursday 10 September 2009 11:37:24 Gael Varoquaux escrigué: On Thu, Sep 10, 2009 at 11:29:49AM +0200, Francesc Alted wrote: The point is: are GPUs prepared to compete with a general-purpose CPUs in all-road

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Rohit Garg
Yes. However, it is worth making the distinction between embarrassingly parallel problems and SIMD problems. Not all embarrassingly parallel problems are SIMD-capable. GPUs do SIMD, not generally embarrassing problems. GPUs exploit both dimensions of parallelism, both simd (aka vectorization)

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-09 Thread Francesc Alted
A Tuesday 08 September 2009 21:19:05 George Dahl escrigué: Sturla Molden sturla at molden.no writes: Erik Tollerud skrev: NumPy arrays on the GPU memory is an easy task. But then I would have to write the computation in OpenCL's dialect of C99? This is true to some extent, but also

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-09 Thread Francesc Alted
A Tuesday 08 September 2009 23:21:53 Christopher Barker escrigué: Also, perhaps a GPU-aware numexpr could be helpful which I think is the kind of thing that Sturla was refering to when she wrote: Incidentally, this will also make it easier to leverage on modern GPUs. Numexpr mainly supports

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-09 Thread Lev Givon
Received from Francesc Alted on Wed, Sep 09, 2009 at 05:18:48AM EDT: (snip) The point here is that matrix-matrix multiplications (or, in general, functions with a large operation/element ratio) are a *tiny* part of all the possible operations between arrays that NumPy supports. This is why

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-09 Thread Francesc Alted
A Wednesday 09 September 2009 11:26:06 Francesc Alted escrigué: A Tuesday 08 September 2009 23:21:53 Christopher Barker escrigué: Also, perhaps a GPU-aware numexpr could be helpful which I think is the kind of thing that Sturla was refering to when she wrote: Incidentally, this will

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-09 Thread James Bergstra
On Wed, Sep 9, 2009 at 10:41 AM, Francesc Alted fal...@pytables.org wrote: Numexpr mainly supports functions that are meant to be used element-wise, so the operation/element ratio is normally 1 (or close to 1). In these scenarios is where improved memory access is much more important than CPU

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-09 Thread Dag Sverre Seljebotn
Christopher Barker wrote: George Dahl wrote: Sturla Molden sturla at molden.no writes: Teraflops peak performance of modern GPUs is impressive. But NumPy cannot easily benefit from that. I know that for my work, I can get around an order of a 50-fold speedup over numpy using a python

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-09 Thread Sturla Molden
George Dahl skrev: I know that for my work, I can get around an order of a 50-fold speedup over numpy using a python wrapper for a simple GPU matrix class. So I might be dealing with a lot of matrix products where I multiply a fixed 512 by 784 matrix by a 784 by 256 matrix that changes

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-09 Thread Sturla Molden
James Bergstra skrev: Suppose you want to evaluate dot(a*b+c*sqrt(d), e). The GPU is great for doing dot(), The CPU is equally great (or better?) for doing dot(). In both cases: - memory access scale O(n) for dot producs. - computation scale O(n) for dot producs. - memory is low - computation

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-09 Thread David Warde-Farley
On 10-Sep-09, at 12:47 AM, Sturla Molden wrote: The CPU is equally great (or better?) for doing dot(). In both cases: - memory access scale O(n) for dot producs. - computation scale O(n) for dot producs. - memory is low - computation is fast (faster for GPU) You do realize that the

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-09 Thread Fernando Perez
On Wed, Sep 9, 2009 at 9:47 PM, Sturla Moldenstu...@molden.no wrote: James Bergstra skrev: Suppose you want to evaluate dot(a*b+c*sqrt(d), e).  The GPU is great for doing dot(), The CPU is equally great (or better?) for doing dot(). In both cases: - memory access scale O(n) for dot producs.

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-08 Thread Christopher Barker
George Dahl wrote: Sturla Molden sturla at molden.no writes: Teraflops peak performance of modern GPUs is impressive. But NumPy cannot easily benefit from that. I know that for my work, I can get around an order of a 50-fold speedup over numpy using a python wrapper for a simple GPU matrix

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-08 Thread George Dahl
Sturla Molden sturla at molden.no writes: Erik Tollerud skrev: NumPy arrays on the GPU memory is an easy task. But then I would have to write the computation in OpenCL's dialect of C99? This is true to some extent, but also probably difficult to do given the fact that paralellizable

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-02 Thread Romain Brette
Hi everyone, In case anyone is interested, I just set up a google group to discuss GPU-based simulation for our Python neural simulator Brian: http://groups.google.fr/group/brian-on-gpu Our simulator relies heavily Numpy. I would be very happy if the GPU experts here would like to share their

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-21 Thread Sturla Molden
Erik Tollerud skrev: NumPy arrays on the GPU memory is an easy task. But then I would have to write the computation in OpenCL's dialect of C99? This is true to some extent, but also probably difficult to do given the fact that paralellizable algorithms are generally more difficult to

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-20 Thread Erik Tollerud
I realize this topic is a bit old, but I couldn't help but add something I forgot to mention earlier... I mean, once the computations are moved elsewhere numpy is basically a convenient way to address memory. That is how I mostly use NumPy, though. Computations I often do in Fortran 95 or C.

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-07 Thread Romain Brette
Sturla Molden a écrit : Thus, here is my plan: 1. a special context-manager class 2. immutable arrays inside with statement 3. lazy evaluation: expressions build up a parse tree 4. dynamic code generation 5. evaluation on exit There seems to be some similarity with what we want to do to

[Numpy-discussion] Fwd: GPU Numpy

2009-08-06 Thread James Bergstra
David Warde-Farley dwf at cs.toronto.edu writes: It did inspire some of our colleagues in Montreal to create this, though:      http://code.google.com/p/cuda-ndarray/ I gather it is VERY early in development, but I'm sure they'd love contributions! Hi David, That does look quite close to

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-06 Thread Charles R Harris
On Thu, Aug 6, 2009 at 11:12 AM, James Bergstra bergs...@iro.umontreal.cawrote: David Warde-Farley dwf at cs.toronto.edu writes: It did inspire some of our colleagues in Montreal to create this, though: http://code.google.com/p/cuda-ndarray/ I gather it is VERY early in

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-06 Thread James Bergstra
On Thu, Aug 6, 2009 at 1:19 PM, Charles R Harrischarlesr.har...@gmail.com wrote: I almost looks like you are reimplementing numpy, in c++ no less. Is there any reason why you aren't working with a numpy branch and just adding ufuncs? I don't know how that would work. The Ufuncs need a

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-06 Thread Erik Tollerud
Note that this is from a user perspective, as I have no particular plan of developing the details of this implementation, but I've thought for a long time that GPU support could be great for numpy (I would also vote for OpenCL support over cuda, although conceptually they seem quite similar)...

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-06 Thread Matthieu Brucher
2009/8/6 Erik Tollerud erik.tolle...@gmail.com: Note that this is from a user perspective, as I have no particular plan of developing the details of this implementation, but I've thought for a long time that GPU support could be great for numpy (I would also vote for OpenCL support over cuda,

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-06 Thread David Warde-Farley
On 6-Aug-09, at 2:54 PM, Erik Tollerud wrote: Now linear algebra or FFTs on a GPU would probably be a huge boon, I'll admit - especially if it's in the form of a drop-in replacement for the numpy or scipy versions. The word I'm hearing from people in my direct acquaintance who are

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-06 Thread Sturla Molden
Now linear algebra or FFTs on a GPU would probably be a huge boon, I'll admit - especially if it's in the form of a drop-in replacement for the numpy or scipy versions. NumPy generate temporary arrays for expressions involving ndarrays. This extra allocation and copying often takes more

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-06 Thread Robert Kern
On Thu, Aug 6, 2009 at 15:57, Sturla Moldenstu...@molden.no wrote: Now linear algebra or FFTs on a GPU would probably be a huge boon, I'll admit - especially if it's in the form of a drop-in replacement for the numpy or scipy versions. NumPy generate temporary arrays for expressions

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-06 Thread Sturla Molden
Robert Kern wrote: I believe that is exactly the point that Erik is making. :-) I wasn't arguing against him, just suggesting a solution. :-) I have big hopes for lazy evaluation, if we can find a way to to it right. Sturla ___ NumPy-Discussion

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-06 Thread James Bergstra
On Thu, Aug 6, 2009 at 4:57 PM, Sturla Moldenstu...@molden.no wrote: Now linear algebra or FFTs on a GPU would probably be a huge boon, I'll admit - especially if it's in the form of a drop-in replacement for the numpy or scipy versions. NumPy generate temporary arrays for expressions

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-06 Thread Charles R Harris
On Thu, Aug 6, 2009 at 3:29 PM, James Bergstra bergs...@iro.umontreal.cawrote: On Thu, Aug 6, 2009 at 4:57 PM, Sturla Moldenstu...@molden.no wrote: Now linear algebra or FFTs on a GPU would probably be a huge boon, I'll admit - especially if it's in the form of a drop-in replacement for

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-06 Thread Sturla Molden
Charles R Harris wrote: Whether the code that gets compiled is written using lazy evaluation (ala Sturla), or is expressed some other way seems like an independent issue. It sounds like one important thing would be having arrays that reside on the GPU. Memory management is slow compared to

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-06 Thread Sturla Molden
Sturla Molden wrote: Memory management is slow compared to computation. Operations like malloc, free and memcpy is not faster for VRAM than for RAM. Actually it's not VRAM anymore, but whatever you call the memory dedicated to the GPU. It is cheap to put 8 GB of RAM into a computer, but

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-06 Thread Charles R Harris
On Thu, Aug 6, 2009 at 4:36 PM, Sturla Molden stu...@molden.no wrote: Charles R Harris wrote: Whether the code that gets compiled is written using lazy evaluation (ala Sturla), or is expressed some other way seems like an independent issue. It sounds like one important thing would be

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-06 Thread Sturla Molden
Charles R Harris wrote: I mean, once the computations are moved elsewhere numpy is basically a convenient way to address memory. That is how I mostly use NumPy, though. Computations I often do in Fortran 95 or C. NumPy arrays on the GPU memory is an easy task. But then I would have to

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-06 Thread Sturla Molden
James Bergstra wrote: The plan you describe is a good one, and Theano (www.pylearn.org/theano) almost exactly implements it. You should check it out. It does not use 'with' syntax at the moment, but it could provide the backend machinery for your mechanism if you want to go forward with

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-06 Thread Charles R Harris
On Thu, Aug 6, 2009 at 5:10 PM, Sturla Molden stu...@molden.no wrote: Charles R Harris wrote: I mean, once the computations are moved elsewhere numpy is basically a convenient way to address memory. That is how I mostly use NumPy, though. Computations I often do in Fortran 95 or C.

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-06 Thread Fernando Perez
On Thu, Aug 6, 2009 at 1:57 PM, Sturla Moldenstu...@molden.no wrote: In order to reduce the effect of immutable arrays, we could introduce a context-manager. Inside the with statement, all arrays would be immutable. Second, the __exit__ method could trigger the code generator and do all the

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-06 Thread Robert Kern
On Thu, Aug 6, 2009 at 19:00, Fernando Perezfperez@gmail.com wrote: On Thu, Aug 6, 2009 at 1:57 PM, Sturla Moldenstu...@molden.no wrote: In order to reduce the effect of immutable arrays, we could introduce a context-manager. Inside the with statement, all arrays would be immutable.