On Oct 6, 2015, at 10:19 AM, Dahai Guo <dahaiguo2...@yahoo.com> wrote: > > Thanks, Gilles. Some more questions: > > 1. how does Open MPI define the priorities of the different collective > components? what criteria is based on?
The priorities are in the range of [0, 100] (100=highest). The priorities tend to be fairly coarse-grained; they're mainly based on relative knowledge of how good / bad a particular algorithm is going to be. > 2. how does a MPI collective function (MPI_Barrier for example) choose the > exact algorithm it use? based on message size, and communicator size? any > other factors? Yes (all of the above). Meaning: each component is responsible for a) determining whether it will provide a function pointer for each operation, and b) what that function pointer's priority should be (same disclaimer as my last mail: I don't remember offhand if there's a single priority for the whole component, or on a per-function-pointer/operation basis). Hence, the component can use whatever criteria it wants to determine if it wants to provide a function pointer or not. E.g., if it only has algorithms that work with communicators that have a size that is a power of 2, then it can use that information to determine whether it wants to provide a function pointer for a new communicator or not. > 3. when does MPI_Barrier choose the algorithm? in ompi_mpi_init? or every > time the API program calls the MPI_barrier? A combination of: when the communicator is constructed and when the barrier is run. I already described the communicator-constructor scenario. But in addition to that, it's certainly possible to have a collective operation dispatch to a function that makes a further run-time based decision (the tuned collective component does a lot of this). For barrier that wouldn't really be necessary (because you can setup everything at communicator constructor time because the MPI_BCAST API doesn't have any variation in its parameters -- i.e., you know everything at communicator constructor time). But for other operations, you might choose different algorithms depending on the number of local peers, the size of the message, ...etc. Hence, you might want to make the final algorithm dispatch decision when MPI_GATHER is invoked with the final set of parameters, etc. > 4. all the MPI collective functions follow the same procedure to choose > algorithms in the API program? I'm not sure how to parse this question. In general, all MPI collective operations follow the same procedure to select which component is selected at communicator constructor time. When the collective operation is dispatched off to the module at run time (e.g., when MPI_BCAST is invoked), it's then up to the module to decide what to do next (i.e., how to actually effect that collective operation). > It would be great if you can point out some main OMPI files and functions > that are involved in the process. You might want to step through the selection process with a debugger to see what happens. Set a breakpoint on mca_coll_base_comm_select() and step through from there. > Dahai > > > > On Tuesday, October 6, 2015 1:08 AM, Gilles Gouaillardet > <gilles.gouaillar...@gmail.com> wrote: > > > at first, you can check the priorities of the various coll modules > with ompi_info > > $ ompi_info --all | grep \"coll_ | grep priority > MCA coll: parameter "coll_basic_priority" (current > value: "10", data source: default, level: 9 dev/all, type: int) > MCA coll: parameter "coll_inter_priority" (current > value: "40", data source: default, level: 9 dev/all, type: int) > MCA coll: parameter "coll_libnbc_priority" (current > value: "10", data source: default, level: 9 dev/all, type: int) > MCA coll: parameter "coll_ml_priority" (current value: > "0", data source: default, level: 9 dev/all, type: int) > MCA coll: parameter "coll_self_priority" (current > value: "75", data source: default, level: 9 dev/all, type: int) > MCA coll: parameter "coll_sm_priority" (current value: > "0", data source: default, level: 9 dev/all, type: int) > MCA coll: parameter "coll_tuned_priority" (current > value: "30", data source: default, level: 6 tuner/all, type: int) > > > coll_tuned_priority likely the collective module you will be using. > then you can check the various ompi_coll_tuned_*_intra_dec_fixed functions in > ompi/mca/coll/tuned/coll_tuned_decision_fixed.c > this is how the tuned collective module selects algorithms based on > communicator size and message size. > > Cheers, > > Gilles > > On Sun, Oct 4, 2015 at 11:12 AM, Dahai Guo <dahaiguo2...@yahoo.com> wrote: > > Thanks, Jeff. I am trying to understand in detail how Open MPI works in the > > run time. What main functions does it call to select and initialize the coll > > components? Using the "helloworld" as an example, how does it select and > > initialize the MPI_Barrier algorithm? which C functions are involved and > > used in the process? > > > > Dahai > > > > > > > > On Friday, October 2, 2015 7:50 PM, Jeff Squyres (jsquyres) > > <jsquy...@cisco.com> wrote: > > > > > > On Oct 2, 2015, at 2:21 PM, Dahai Guo <dahaiguo2...@yahoo.com> wrote: > >> > >> Is there any way to trace open mpi internal function calls in a MPI user > >> program? > > > > Unfortunately, not easily -- other than using a debugger, for example. > > > >> If so, can any one explain it with an example? such as helloworld? I > >> build open MPI with the VampirTrace options, and compile the following > >> program with picc-vt,. but I didn't get any tracing info. > > > > Open MPI is a giant state machine -- MPI_INIT, for example, invokes slightly > > fewer than a bazillion functions (e.g., it initializes every framework and > > many components/plugins). > > > > Is there something in particular that you're looking for / want to know > > about? > > > >> Thanks > >> > >> D. G. > >> > >> #include <stdio.h> > >> #include <mpi.h> > >> > >> > >> int main (int argc, char **argv) > >> { > >> int rank, size; > >> > >> MPI_Init (&argc, &argv); > >> MPI_Comm_rank (MPI_COMM_WORLD, &rank); > >> MPI_Comm_size (MPI_COMM_WORLD, &size); > >> printf( "Hello world from process %d of %d\n", rank, size ); > >> MPI_Barrier(MPI_COMM_WORLD); > >> MPI_Finalize(); > >> return 0; > >> } > >> > >> _______________________________________________ > >> devel mailing list > >> de...@open-mpi.org > >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > >> Link to this post: > >> http://www.open-mpi.org/community/lists/devel/2015/10/18125.php > > > > > > -- > > Jeff Squyres > > jsquy...@cisco.com > > For corporate legal information go to: > > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > > > > > > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: > > > http://www.open-mpi.org/community/lists/devel/2015/10/18138.php > > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/10/18140.php -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/