Hello Gilles

Setting -mca mpi_add_procs_cutoff 1024 indeed makes a difference to the
output, as follows:

With -mca mpi_add_procs_cutoff 1024:
reachable =     0x1
(Note that add_procs was called once and the value of 'reachable is
correct')

Without -mca mpi_add_procs_cutoff 1024
reachable =     0x0
reachable = NULL
reachable = NULL
(Note that add_procs() was caklled three times and the value of 'reachable'
seems wrong.

The program does run correctly in either case. The program listing is as
below (note that I have removed output from the program itself in the above
reporting.)

The code that prints 'reachable' is as follows:

if (reachable == NULL)
    printf("reachable = NULL\n");
else
{
    int i;
    printf("reachable = ");
    for (i = 0; i < reachable->array_size; i++)
    printf("\t0x%llu", reachable->bitmap[i]);
    printf("\n\n");
}
return OPAL_SUCCESS;

And the code for the test program is as follows:

#include <mpi.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
    int world_size, world_rank, name_len;
    char hostname[MPI_MAX_PROCESSOR_NAME], buf[8];

    MPI_Init(&argc, &argv);
    MPI_Comm_size(MPI_COMM_WORLD, &world_size);
    MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
    MPI_Get_processor_name(hostname, &name_len);
    printf("Hello world from processor %s, rank %d out of %d processors\n",
hostname, world_rank, world_size);
    if (world_rank == 1)
    {
    MPI_Recv(buf, 6, MPI_CHAR, 0, 99, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
    printf("%s received %s, rank %d\n", hostname, buf, world_rank);
    }
    else
    {
    strcpy(buf, "haha!");
    MPI_Send(buf, 6, MPI_CHAR, 1, 99, MPI_COMM_WORLD);
    printf("%s sent %s, rank %d\n", hostname, buf, world_rank);
    }
    MPI_Barrier(MPI_COMM_WORLD);
    MPI_Finalize();
    return 0;
}



The surgeon general advises you to eat right, exercise regularly and quit
ageing.

On Sun, May 15, 2016 at 10:49 AM, Gilles Gouaillardet <
gilles.gouaillar...@gmail.com> wrote:

> At first glance, that seems a bit odd...
> are you sure you correctly print the reachable bitmap ?
> I would suggest you add some instrumentation to understand what happens
> (e.g., printf before opal_bitmap_set_bit() and other places that prevent
> this from happening)
>
> one more thing ...
> now, master default behavior is
> mpirun --mca mpi_add_procs_cutoff 0 ...
> you might want to try
> mpirun --mca mpi_add_procs_cutoff 1024 ...
> and see if things make more sense.
> if it helps, and iirc, there is a parameter so a btl can report it does
> not support cutoff.
>
>
> Cheers,
>
> Gilles
>
> On Sunday, May 15, 2016, dpchoudh . <dpcho...@gmail.com> wrote:
>
>> Hello Gilles
>>
>> Thanks for jumping in to help again. Actually, I had already tried some
>> of your suggestions before asking for help.
>>
>> I have several interconnects that can run both openib and tcp BTL. To
>> simplify things, I explicitly mentioned TCP:
>>
>> mpirun -np 2 -hostfile ~/hostfile -mca pml ob1 -mca btl self.tcp ./mpitest
>>
>> where mpitest is a small program that does MPI_Send()/MPI_Recv() on a
>> small string, and then does an MPI_Barrier(). The program does work as
>> expected.
>>
>> I put a printf on the last line of mca_tcp_add_procs() to print the value
>> of 'reachable'. What I saw was that the value was always 0 when it was
>> invoked for Send()/Recv() and the pointer itself was NULL when invoked for
>> Barrier()
>>
>> Next I looked at pml_ob1_add_procs(), where the call chain starts, and
>> found that it initializes and passes an opal_bitmap_t reachable down the
>> call chain, but the resulting value is not used later in the code (the
>> memory is simply freed later).
>>
>> That, coupled with the fact that I am trying to imitate what the other
>> BTL implementations are doing, yet in mca_bml_r2_endpoint_add_btl() by BTL
>> is not being picked up, left me puzzled. Please note that the interconnect
>> that I am developing for is on a different cluster (than where I ran the
>> above test for TCP BTL.)
>>
>> Thanks again
>> Durga
>>
>> The surgeon general advises you to eat right, exercise regularly and quit
>> ageing.
>>
>> On Sun, May 15, 2016 at 10:20 AM, Gilles Gouaillardet <
>> gilles.gouaillar...@gmail.com> wrote:
>>
>>> did you check the add_procs callbacks ?
>>> (e.g. mca_btl_tcp_add_procs() for the tcp btl)
>>> this is where the reachable bitmap is set, and I guess this is what you
>>> are looking for.
>>>
>>> keep in mind that if several btl can be used, the one with the higher
>>> exclusivity is used
>>> (e.g. tcp is never used if openib is available)
>>> you can simply force your btl and self, and the ob1 pml, so you do not
>>> have to worry about other btl exclusivity.
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>>
>>> On Sunday, May 15, 2016, dpchoudh . <dpcho...@gmail.com> wrote:
>>>
>>>> Hello all
>>>>
>>>> I have been struggling with this issue for a while and figured it might
>>>> be a good idea to ask for help.
>>>>
>>>> Where (in the code path) is the connectivity map created?
>>>>
>>>> I can see that it is *used* in mca_bml_r2_endpoint_add_btl(), but
>>>> obviously I am not setting it up right, because this routine is not finding
>>>> the BTL corresponding to my interconnect.
>>>>
>>>> Thanks in advance
>>>> Durga
>>>>
>>>> The surgeon general advises you to eat right, exercise regularly and
>>>> quit ageing.
>>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2016/05/18975.php
>>>
>>
>>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2016/05/18977.php
>

Reply via email to