i cannot reproduce this behavior.

note mca_btl_tcp_add_procs is invoked once per tcp component (e.g. once per physical NIC)

so you might want to explicitly select one nic

mpirun --mca btl_tcp_if_include xxx ...

my printf output are the same and regardless the mpi_add_procs_cutoff value


Cheers,


Gilles

On 5/16/2016 12:22 AM, dpchoudh . wrote:
Sorry, I accidentally pressed 'Send' before I was done writing the last mail. What I wanted to ask was what is the parameter mpi_add_procs_cutoff and why adding it seems to make a difference in the code path but not in the end result of the program? How would it help me debug my problem?

Thank you
Durga

The surgeon general advises you to eat right, exercise regularly and quit ageing.

On Sun, May 15, 2016 at 11:17 AM, dpchoudh . <dpcho...@gmail.com <mailto:dpcho...@gmail.com>> wrote:

    Hello Gilles

    Setting -mca mpi_add_procs_cutoff 1024 indeed makes a difference
    to the output, as follows:

    With -mca mpi_add_procs_cutoff 1024:
    reachable =     0x1
    (Note that add_procs was called once and the value of 'reachable
    is correct')

    Without -mca mpi_add_procs_cutoff 1024
    reachable =     0x0
    reachable = NULL
    reachable = NULL
    (Note that add_procs() was caklled three times and the value of
    'reachable' seems wrong.

    The program does run correctly in either case. The program listing
    is as below (note that I have removed output from the program
    itself in the above reporting.)

    The code that prints 'reachable' is as follows:

    if (reachable == NULL)
        printf("reachable = NULL\n");
    else
    {
        int i;
        printf("reachable = ");
        for (i = 0; i < reachable->array_size; i++)
        printf("\t0x%llu", reachable->bitmap[i]);
        printf("\n\n");
    }
    return OPAL_SUCCESS;

    And the code for the test program is as follows:

    #include <mpi.h>
    #include <stdio.h>
    #include <string.h>
    #include <stdlib.h>

    int main(int argc, char *argv[])
    {
        int world_size, world_rank, name_len;
        char hostname[MPI_MAX_PROCESSOR_NAME], buf[8];

        MPI_Init(&argc, &argv);
        MPI_Comm_size(MPI_COMM_WORLD, &world_size);
        MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
        MPI_Get_processor_name(hostname, &name_len);
        printf("Hello world from processor %s, rank %d out of %d
    processors\n", hostname, world_rank, world_size);
        if (world_rank == 1)
        {
        MPI_Recv(buf, 6, MPI_CHAR, 0, 99, MPI_COMM_WORLD,
    MPI_STATUS_IGNORE);
        printf("%s received %s, rank %d\n", hostname, buf, world_rank);
        }
        else
        {
        strcpy(buf, "haha!");
        MPI_Send(buf, 6, MPI_CHAR, 1, 99, MPI_COMM_WORLD);
        printf("%s sent %s, rank %d\n", hostname, buf, world_rank);
        }
        MPI_Barrier(MPI_COMM_WORLD);
        MPI_Finalize();
        return 0;
    }



    The surgeon general advises you to eat right, exercise regularly
    and quit ageing.

    On Sun, May 15, 2016 at 10:49 AM, Gilles Gouaillardet
    <gilles.gouaillar...@gmail.com
    <mailto:gilles.gouaillar...@gmail.com>> wrote:

        At first glance, that seems a bit odd...
        are you sure you correctly print the reachable bitmap ?
        I would suggest you add some instrumentation to understand
        what happens
        (e.g., printf before opal_bitmap_set_bit() and other places
        that prevent this from happening)

        one more thing ...
        now, master default behavior is
        mpirun --mca mpi_add_procs_cutoff 0 ...
        you might want to try
        mpirun --mca mpi_add_procs_cutoff 1024 ...
        and see if things make more sense.
        if it helps, and iirc, there is a parameter so a btl can
        report it does not support cutoff.


        Cheers,

        Gilles

        On Sunday, May 15, 2016, dpchoudh . <dpcho...@gmail.com
        <mailto:dpcho...@gmail.com>> wrote:

            Hello Gilles

            Thanks for jumping in to help again. Actually, I had
            already tried some of your suggestions before asking for help.

            I have several interconnects that can run both openib and
            tcp BTL. To simplify things, I explicitly mentioned TCP:

            mpirun -np 2 -hostfile ~/hostfile -mca pml ob1 -mca btl
            self.tcp ./mpitest

            where mpitest is a small program that does
            MPI_Send()/MPI_Recv() on a small string, and then does an
            MPI_Barrier(). The program does work as expected.

            I put a printf on the last line of mca_tcp_add_procs() to
            print the value of 'reachable'. What I saw was that the
            value was always 0 when it was invoked for Send()/Recv()
            and the pointer itself was NULL when invoked for Barrier()

            Next I looked at pml_ob1_add_procs(), where the call chain
            starts, and found that it initializes and passes an
            opal_bitmap_t reachable down the call chain, but the
            resulting value is not used later in the code (the memory
            is simply freed later).

            That, coupled with the fact that I am trying to imitate
            what the other BTL implementations are doing, yet in
            mca_bml_r2_endpoint_add_btl() by BTL is not being picked
            up, left me puzzled. Please note that the interconnect
            that I am developing for is on a different cluster (than
            where I ran the above test for TCP BTL.)

            Thanks again
            Durga

            The surgeon general advises you to eat right, exercise
            regularly and quit ageing.

            On Sun, May 15, 2016 at 10:20 AM, Gilles Gouaillardet
            <gilles.gouaillar...@gmail.com> wrote:

                did you check the add_procs callbacks ?
                (e.g. mca_btl_tcp_add_procs() for the tcp btl)
                this is where the reachable bitmap is set, and I guess
                this is what you are looking for.

                keep in mind that if several btl can be used, the one
                with the higher exclusivity is used
                (e.g. tcp is never used if openib is available)
                you can simply force your btl and self, and the ob1
                pml, so you do not have to worry about other btl
                exclusivity.

                Cheers,

                Gilles


                On Sunday, May 15, 2016, dpchoudh .
                <dpcho...@gmail.com> wrote:

                    Hello all

                    I have been struggling with this issue for a while
                    and figured it might be a good idea to ask for help.

                    Where (in the code path) is the connectivity map
                    created?

                    I can see that it is *used* in
                    mca_bml_r2_endpoint_add_btl(), but obviously I am
                    not setting it up right, because this routine is
                    not finding the BTL corresponding to my interconnect.

                    Thanks in advance
                    Durga

                    The surgeon general advises you to eat right,
                    exercise regularly and quit ageing.


                _______________________________________________
                devel mailing list
                de...@open-mpi.org
                Subscription:
                https://www.open-mpi.org/mailman/listinfo.cgi/devel
                Link to this post:
                http://www.open-mpi.org/community/lists/devel/2016/05/18975.php



        _______________________________________________
        devel mailing list
        de...@open-mpi.org <mailto:de...@open-mpi.org>
        Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
        Link to this post:
        http://www.open-mpi.org/community/lists/devel/2016/05/18977.php





_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2016/05/18979.php

Reply via email to