BTW, the CUDA toolkit for programming the GPU's is developing rapidly (and is 
still in beta). here are memory bandwidths actually measured on my machine:

CUDA version 0.8:

Host to Device Bandwidth for Pinned memory
Transfer Size (Bytes)   Bandwidth(MB/s)
 33554432               1647.6

Device to Host Bandwidth for Pinned memory
Transfer Size (Bytes)   Bandwidth(MB/s)
 33554432               1654.7

Device to Device Bandwidth
Transfer Size (Bytes)   Bandwidth(MB/s)
 33554432               3332.1

CUDA version 0.9:

Host to Device Bandwidth for Pinned memory
Transfer Size (Bytes)   Bandwidth(MB/s)
 33554432               2700.0

Device to Host Bandwidth for Pinned memory
Transfer Size (Bytes)   Bandwidth(MB/s)
 33554432               2693.3

Device to Device Bandwidth
Transfer Size (Bytes)   Bandwidth(MB/s)
 33554432               53768.0

In other words, once uploaded to the GPU, you can afford to reorder your data 
any way you want as often as you need to to take advantage of parallel ops.

Or to put it another way, the GPU has about the same bandwidth to its memory 
as your brain does to its on a per-byte basis, in my ballpark estimates. 
Somewhere on the order of 100 GPUs is a brain-equivalent. 

We're getting damn close...

Josh

-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=231415&user_secret=e9e40a7e

Reply via email to