BTW, the CUDA toolkit for programming the GPU's is developing rapidly (and is still in beta). here are memory bandwidths actually measured on my machine:
CUDA version 0.8: Host to Device Bandwidth for Pinned memory Transfer Size (Bytes) Bandwidth(MB/s) 33554432 1647.6 Device to Host Bandwidth for Pinned memory Transfer Size (Bytes) Bandwidth(MB/s) 33554432 1654.7 Device to Device Bandwidth Transfer Size (Bytes) Bandwidth(MB/s) 33554432 3332.1 CUDA version 0.9: Host to Device Bandwidth for Pinned memory Transfer Size (Bytes) Bandwidth(MB/s) 33554432 2700.0 Device to Host Bandwidth for Pinned memory Transfer Size (Bytes) Bandwidth(MB/s) 33554432 2693.3 Device to Device Bandwidth Transfer Size (Bytes) Bandwidth(MB/s) 33554432 53768.0 In other words, once uploaded to the GPU, you can afford to reorder your data any way you want as often as you need to to take advantage of parallel ops. Or to put it another way, the GPU has about the same bandwidth to its memory as your brain does to its on a per-byte basis, in my ballpark estimates. Somewhere on the order of 100 GPUs is a brain-equivalent. We're getting damn close... Josh ----- This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=231415&user_secret=e9e40a7e
