Marcin, I am not sure I understand your question, a bcast is a collective operation that must be posted by all participants. Independently at what level the bcast is serviced, if some of the participants have not posted their participation to the collective, only partial progress can be made.
George. On Thu, Mar 21, 2019 at 12:24 PM Joshua Ladd <jladd.m...@gmail.com> wrote: > Marcin, > > HPC-X implements the MPI BCAST operation by leveraging hardware multicast > capabilities. Starting with HPC-X v2.3 we introduced a new multicast based > algorithm for large messages as well. Hardware multicast scales as O(1) > modulo switch hops. It is the most efficient way to broadcast a message in > an IB network. > > Hope this helps. > > Best, > > Josh > > > > On Thu, Mar 21, 2019 at 5:01 AM marcin.krotkiewski < > marcin.krotkiew...@gmail.com> wrote: > >> Thanks, George! So, the function you mentioned is used when I turn off >> HCOLL and use OpenMPI's tuned coll instead. That helps a lot. Another thing >> that makes me think is that in my case the data is sent to the targets >> asynchronously, or rather - it is a 'put' operation in nature, and the >> targets don't know, when the data is ready. I guess the tree algorithms you >> mentioned require active participation of all nodes, otherwise the >> algorithm will not progress? Is it enough to call any MPI routine to assure >> progression, or do I have to call the matching Bcast? >> >> Anyone from Mellanox here, who knows how HCOLL does this internally? >> Especially on the EDR architecture. Is there any hardware aid? >> >> Thanks! >> >> Marcin >> >> >> On 3/20/19 5:10 PM, George Bosilca wrote: >> >> If you have support for FCA then it might happen that the collective will >> use the hardware support. In any case, most of the bcast algorithms have a >> logarithmic behavior, so there will be at most O(log(P)) memory accesses on >> the root. >> >> If you want to take a look at the code in OMPI to understand what >> function is called in your specific case head to ompi/mca/coll/tuned/ and >> search for the ompi_coll_tuned_bcast_intra_dec_fixed function >> in coll_tuned_decision_fixed.c. >> >> George. >> >> >> On Wed, Mar 20, 2019 at 4:53 AM marcin.krotkiewski < >> marcin.krotkiew...@gmail.com> wrote: >> >>> Hi! >>> >>> I'm wondering about the details of Bcast implementation in OpenMPI. I'm >>> specifically interested in IB interconnects, but information about other >>> architectures (and OpenMPI in general) would also be very useful. >>> >>> I am working with a code, which sends the same (large) message to a >>> bunch of 'neighboring' processes. Somewhat like a ghost-zone exchange, >>> but the message is the same for all neighbors. Since memory bandwidth is >>> a scarce resource, I'd like to make sure we send the message with fewest >>> possible memory accesses. >>> >>> Hence the question: what does OpenMPI (and specifically for the IB case >>> - the HPCX) do in such case? Does it get the buffer from memory O(1) >>> times to send it to n peers, and the broadcast is orchestrated by the >>> hardware? Or does it have to read the memory O(n) times? Is it more >>> efficient to use Bcast, or is it the same as implementing the operation >>> by n distinct send / put operations? Finally, is there any way to use >>> the RMA put method with multiple targets, so that I only have to read >>> the host memory once, and the switches / HCA take care of the rest? >>> >>> Thanks a lot for any insights! >>> >>> Marcin >>> >>> >>> _______________________________________________ >>> devel mailing list >>> devel@lists.open-mpi.org >>> https://lists.open-mpi.org/mailman/listinfo/devel >> >> >> _______________________________________________ >> devel mailing >> listde...@lists.open-mpi.orghttps://lists.open-mpi.org/mailman/listinfo/devel >> >> _______________________________________________ >> devel mailing list >> devel@lists.open-mpi.org >> https://lists.open-mpi.org/mailman/listinfo/devel > > _______________________________________________ > devel mailing list > devel@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/devel
_______________________________________________ devel mailing list devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/devel