Hello, I recently figured out that when running multi-GPU MPI application (one MPI process to one GPU) on a computer using Intel Omni-Path, you need to do the GPU binding before MPI initialization, according to Intel documentation<https://www.intel.com/content/dam/support/us/en/documents/network-and-i-o/fabric-products/Intel_PSM2_PG_H76473_v13_0.pdf>. If this seems correct to you, could you update your "Running CUDA-aware" web page accordingly ? This would help people to know what is the correct order.
Sincerely Thomas