That's 53.8 GB/s for a load of 33.6 MB? Is there a burst cache effect going on here or do you think that's sustainable for multiple seconds?
Bo On Fri, 22 Jun 2007, J Storrs Hall, PhD wrote: ) BTW, the CUDA toolkit for programming the GPU's is developing rapidly (and is ) still in beta). here are memory bandwidths actually measured on my machine: ) ) CUDA version 0.8: ) ) Host to Device Bandwidth for Pinned memory ) Transfer Size (Bytes) Bandwidth(MB/s) ) 33554432 1647.6 ) ) Device to Host Bandwidth for Pinned memory ) Transfer Size (Bytes) Bandwidth(MB/s) ) 33554432 1654.7 ) ) Device to Device Bandwidth ) Transfer Size (Bytes) Bandwidth(MB/s) ) 33554432 3332.1 ) ) CUDA version 0.9: ) ) Host to Device Bandwidth for Pinned memory ) Transfer Size (Bytes) Bandwidth(MB/s) ) 33554432 2700.0 ) ) Device to Host Bandwidth for Pinned memory ) Transfer Size (Bytes) Bandwidth(MB/s) ) 33554432 2693.3 ) ) Device to Device Bandwidth ) Transfer Size (Bytes) Bandwidth(MB/s) ) 33554432 53768.0 ) ) In other words, once uploaded to the GPU, you can afford to reorder your data ) any way you want as often as you need to to take advantage of parallel ops. ) ) Or to put it another way, the GPU has about the same bandwidth to its memory ) as your brain does to its on a per-byte basis, in my ballpark estimates. ) Somewhere on the order of 100 GPUs is a brain-equivalent. ) ) We're getting damn close... ) ) Josh ) ) ----- ) This list is sponsored by AGIRI: http://www.agiri.org/email ) To unsubscribe or change your options, please go to: ) http://v2.listbox.com/member/?& ) ----- This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=231415&user_secret=e9e40a7e
