> The paper is now available online, "CPU-Assisted GPGPU on Fused > CPU-GPU Architectures": > > http://people.engr.ncsu.edu/hzhou/hpca_12_final.pdf
thanks for the reference. > (I have not read the whole paper yet) I think the core idea is that > the CPU acts as a prefetch thread and pulls data into the shared L3 > for the GPU cores (this work is like other prefetch thread research yes, though it's a bit puzzling, since the whole point of GPU design is to have lots of runnable threads on hand, so that you simply switch from stalled to non-stalled threads to hide latency. so in the context of prefetching, I'd expect a bundle of threads to make a non-prefetched reference, stall, but for other bundles to utilize the vector unit while the reference is resolved. gotta read the paper I guess! _______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
