Hello Gilles Setting -mca mpi_add_procs_cutoff 1024 indeed makes a difference to the output, as follows:
With -mca mpi_add_procs_cutoff 1024: reachable = 0x1 (Note that add_procs was called once and the value of 'reachable is correct') Without -mca mpi_add_procs_cutoff 1024 reachable = 0x0 reachable = NULL reachable = NULL (Note that add_procs() was caklled three times and the value of 'reachable' seems wrong. The program does run correctly in either case. The program listing is as below (note that I have removed output from the program itself in the above reporting.) The code that prints 'reachable' is as follows: if (reachable == NULL) printf("reachable = NULL\n"); else { int i; printf("reachable = "); for (i = 0; i < reachable->array_size; i++) printf("\t0x%llu", reachable->bitmap[i]); printf("\n\n"); } return OPAL_SUCCESS; And the code for the test program is as follows: #include <mpi.h> #include <stdio.h> #include <string.h> #include <stdlib.h> int main(int argc, char *argv[]) { int world_size, world_rank, name_len; char hostname[MPI_MAX_PROCESSOR_NAME], buf[8]; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &world_size); MPI_Comm_rank(MPI_COMM_WORLD, &world_rank); MPI_Get_processor_name(hostname, &name_len); printf("Hello world from processor %s, rank %d out of %d processors\n", hostname, world_rank, world_size); if (world_rank == 1) { MPI_Recv(buf, 6, MPI_CHAR, 0, 99, MPI_COMM_WORLD, MPI_STATUS_IGNORE); printf("%s received %s, rank %d\n", hostname, buf, world_rank); } else { strcpy(buf, "haha!"); MPI_Send(buf, 6, MPI_CHAR, 1, 99, MPI_COMM_WORLD); printf("%s sent %s, rank %d\n", hostname, buf, world_rank); } MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize(); return 0; } The surgeon general advises you to eat right, exercise regularly and quit ageing. On Sun, May 15, 2016 at 10:49 AM, Gilles Gouaillardet < gilles.gouaillar...@gmail.com> wrote: > At first glance, that seems a bit odd... > are you sure you correctly print the reachable bitmap ? > I would suggest you add some instrumentation to understand what happens > (e.g., printf before opal_bitmap_set_bit() and other places that prevent > this from happening) > > one more thing ... > now, master default behavior is > mpirun --mca mpi_add_procs_cutoff 0 ... > you might want to try > mpirun --mca mpi_add_procs_cutoff 1024 ... > and see if things make more sense. > if it helps, and iirc, there is a parameter so a btl can report it does > not support cutoff. > > > Cheers, > > Gilles > > On Sunday, May 15, 2016, dpchoudh . <dpcho...@gmail.com> wrote: > >> Hello Gilles >> >> Thanks for jumping in to help again. Actually, I had already tried some >> of your suggestions before asking for help. >> >> I have several interconnects that can run both openib and tcp BTL. To >> simplify things, I explicitly mentioned TCP: >> >> mpirun -np 2 -hostfile ~/hostfile -mca pml ob1 -mca btl self.tcp ./mpitest >> >> where mpitest is a small program that does MPI_Send()/MPI_Recv() on a >> small string, and then does an MPI_Barrier(). The program does work as >> expected. >> >> I put a printf on the last line of mca_tcp_add_procs() to print the value >> of 'reachable'. What I saw was that the value was always 0 when it was >> invoked for Send()/Recv() and the pointer itself was NULL when invoked for >> Barrier() >> >> Next I looked at pml_ob1_add_procs(), where the call chain starts, and >> found that it initializes and passes an opal_bitmap_t reachable down the >> call chain, but the resulting value is not used later in the code (the >> memory is simply freed later). >> >> That, coupled with the fact that I am trying to imitate what the other >> BTL implementations are doing, yet in mca_bml_r2_endpoint_add_btl() by BTL >> is not being picked up, left me puzzled. Please note that the interconnect >> that I am developing for is on a different cluster (than where I ran the >> above test for TCP BTL.) >> >> Thanks again >> Durga >> >> The surgeon general advises you to eat right, exercise regularly and quit >> ageing. >> >> On Sun, May 15, 2016 at 10:20 AM, Gilles Gouaillardet < >> gilles.gouaillar...@gmail.com> wrote: >> >>> did you check the add_procs callbacks ? >>> (e.g. mca_btl_tcp_add_procs() for the tcp btl) >>> this is where the reachable bitmap is set, and I guess this is what you >>> are looking for. >>> >>> keep in mind that if several btl can be used, the one with the higher >>> exclusivity is used >>> (e.g. tcp is never used if openib is available) >>> you can simply force your btl and self, and the ob1 pml, so you do not >>> have to worry about other btl exclusivity. >>> >>> Cheers, >>> >>> Gilles >>> >>> >>> On Sunday, May 15, 2016, dpchoudh . <dpcho...@gmail.com> wrote: >>> >>>> Hello all >>>> >>>> I have been struggling with this issue for a while and figured it might >>>> be a good idea to ask for help. >>>> >>>> Where (in the code path) is the connectivity map created? >>>> >>>> I can see that it is *used* in mca_bml_r2_endpoint_add_btl(), but >>>> obviously I am not setting it up right, because this routine is not finding >>>> the BTL corresponding to my interconnect. >>>> >>>> Thanks in advance >>>> Durga >>>> >>>> The surgeon general advises you to eat right, exercise regularly and >>>> quit ageing. >>>> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2016/05/18975.php >>> >> >> > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2016/05/18977.php >