Hello Gilles
Setting -mca mpi_add_procs_cutoff 1024 indeed makes a difference to the
output, as follows:
With -mca mpi_add_procs_cutoff 1024:
reachable = 0x1
(Note that add_procs was called once and the value of 'reachable is
correct')
Without -mca mpi_add_procs_cutoff 1024
reachable = 0x0
reachable = NULL
reachable = NULL
(Note that add_procs() was caklled three times and the value of 'reachable'
seems wrong.
The program does run correctly in either case. The program listing is as
below (note that I have removed output from the program itself in the above
reporting.)
The code that prints 'reachable' is as follows:
if (reachable == NULL)
printf("reachable = NULL\n");
else
{
int i;
printf("reachable = ");
for (i = 0; i < reachable->array_size; i++)
printf("\t0x%llu", reachable->bitmap[i]);
printf("\n\n");
}
return OPAL_SUCCESS;
And the code for the test program is as follows:
#include <mpi.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
int world_size, world_rank, name_len;
char hostname[MPI_MAX_PROCESSOR_NAME], buf[8];
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
MPI_Get_processor_name(hostname, &name_len);
printf("Hello world from processor %s, rank %d out of %d processors\n",
hostname, world_rank, world_size);
if (world_rank == 1)
{
MPI_Recv(buf, 6, MPI_CHAR, 0, 99, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("%s received %s, rank %d\n", hostname, buf, world_rank);
}
else
{
strcpy(buf, "haha!");
MPI_Send(buf, 6, MPI_CHAR, 1, 99, MPI_COMM_WORLD);
printf("%s sent %s, rank %d\n", hostname, buf, world_rank);
}
MPI_Barrier(MPI_COMM_WORLD);
MPI_Finalize();
return 0;
}
The surgeon general advises you to eat right, exercise regularly and quit
ageing.
On Sun, May 15, 2016 at 10:49 AM, Gilles Gouaillardet <
[email protected]> wrote:
> At first glance, that seems a bit odd...
> are you sure you correctly print the reachable bitmap ?
> I would suggest you add some instrumentation to understand what happens
> (e.g., printf before opal_bitmap_set_bit() and other places that prevent
> this from happening)
>
> one more thing ...
> now, master default behavior is
> mpirun --mca mpi_add_procs_cutoff 0 ...
> you might want to try
> mpirun --mca mpi_add_procs_cutoff 1024 ...
> and see if things make more sense.
> if it helps, and iirc, there is a parameter so a btl can report it does
> not support cutoff.
>
>
> Cheers,
>
> Gilles
>
> On Sunday, May 15, 2016, dpchoudh . <[email protected]> wrote:
>
>> Hello Gilles
>>
>> Thanks for jumping in to help again. Actually, I had already tried some
>> of your suggestions before asking for help.
>>
>> I have several interconnects that can run both openib and tcp BTL. To
>> simplify things, I explicitly mentioned TCP:
>>
>> mpirun -np 2 -hostfile ~/hostfile -mca pml ob1 -mca btl self.tcp ./mpitest
>>
>> where mpitest is a small program that does MPI_Send()/MPI_Recv() on a
>> small string, and then does an MPI_Barrier(). The program does work as
>> expected.
>>
>> I put a printf on the last line of mca_tcp_add_procs() to print the value
>> of 'reachable'. What I saw was that the value was always 0 when it was
>> invoked for Send()/Recv() and the pointer itself was NULL when invoked for
>> Barrier()
>>
>> Next I looked at pml_ob1_add_procs(), where the call chain starts, and
>> found that it initializes and passes an opal_bitmap_t reachable down the
>> call chain, but the resulting value is not used later in the code (the
>> memory is simply freed later).
>>
>> That, coupled with the fact that I am trying to imitate what the other
>> BTL implementations are doing, yet in mca_bml_r2_endpoint_add_btl() by BTL
>> is not being picked up, left me puzzled. Please note that the interconnect
>> that I am developing for is on a different cluster (than where I ran the
>> above test for TCP BTL.)
>>
>> Thanks again
>> Durga
>>
>> The surgeon general advises you to eat right, exercise regularly and quit
>> ageing.
>>
>> On Sun, May 15, 2016 at 10:20 AM, Gilles Gouaillardet <
>> [email protected]> wrote:
>>
>>> did you check the add_procs callbacks ?
>>> (e.g. mca_btl_tcp_add_procs() for the tcp btl)
>>> this is where the reachable bitmap is set, and I guess this is what you
>>> are looking for.
>>>
>>> keep in mind that if several btl can be used, the one with the higher
>>> exclusivity is used
>>> (e.g. tcp is never used if openib is available)
>>> you can simply force your btl and self, and the ob1 pml, so you do not
>>> have to worry about other btl exclusivity.
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>>
>>> On Sunday, May 15, 2016, dpchoudh . <[email protected]> wrote:
>>>
>>>> Hello all
>>>>
>>>> I have been struggling with this issue for a while and figured it might
>>>> be a good idea to ask for help.
>>>>
>>>> Where (in the code path) is the connectivity map created?
>>>>
>>>> I can see that it is *used* in mca_bml_r2_endpoint_add_btl(), but
>>>> obviously I am not setting it up right, because this routine is not finding
>>>> the BTL corresponding to my interconnect.
>>>>
>>>> Thanks in advance
>>>> Durga
>>>>
>>>> The surgeon general advises you to eat right, exercise regularly and
>>>> quit ageing.
>>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> [email protected]
>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2016/05/18975.php
>>>
>>
>>
> _______________________________________________
> devel mailing list
> [email protected]
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2016/05/18977.php
>