[OMPI devel] Memory performance with Bcast

marcin.krotkiewski Wed, 20 Mar 2019 01:55:04 -0700

Hi!

I'm wondering about the details of Bcast implementation in OpenMPI. I'mspecifically interested in IB interconnects, but information about otherarchitectures (and OpenMPI in general) would also be very useful.

I am working with a code, which sends the same (large) message to abunch of 'neighboring' processes. Somewhat like a ghost-zone exchange,but the message is the same for all neighbors. Since memory bandwidth isa scarce resource, I'd like to make sure we send the message with fewestpossible memory accesses.

Hence the question: what does OpenMPI (and specifically for the IB case- the HPCX) do in such case? Does it get the buffer from memory O(1)times to send it to n peers, and the broadcast is orchestrated by thehardware? Or does it have to read the memory O(n) times? Is it moreefficient to use Bcast, or is it the same as implementing the operationby n distinct send / put operations? Finally, is there any way to usethe RMA put method with multiple targets, so that I only have to readthe host memory once, and the switches / HCA take care of the rest?


Thanks a lot for any insights!

Marcin


_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

[OMPI devel] Memory performance with Bcast

Reply via email to