Hello, This isn't directly nouveau related, but since I was welcome on the IRC channel I assume that I may post my question on the ML as well.
We are trying to use an NVidia Tesla card for highly parallel real time computations. Our problem is to get the data from a custom PCIe device into the GPU fast enough (and back). We need to transfer ~200 bytes every ~10us, so the limiting factor is latency rather than bandwidth. So far, we have transferred the data via the host RAM, but I am trying to find a way to do direct peer-to-peer DMA transfer. The Tesla card supports peer-to-peer transfers to other GPUs, so in principle this should be possible. My idea (but I'm happy to hear others as well) is to determine the bus address of some GPU memory location, and then let the other PCIe card push and pull the data as needed. However, I am not sure how to determine the bus address. A brute force method would be to write a marker string somewhere into GPU memory and then search through all PCIe BARs. However, there is a chance that the marker string does not end up in the mapped memory area. I'm currently trying to figure that out by searching through the BARs with the CPU (see other posting). In case that fails: does anyone have a suggestion how else we might accomplish what we want? Is there a way to change the mapping from BARs to GPU memory? Could one of the nouveau debugging tools be used to figure out how the nvidia driver does the GPU-to-GPU transfer and tap of the bus address? Hackish solutions are quite welcome, we are already working with a patched kernel anyway to share the pinned host memory between drivers. Best, -Nikolaus -- »Time flies like an arrow, fruit flies like a Banana.« PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C _______________________________________________ Nouveau mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/nouveau
