Yes, the delay seems to be coming here:

        err = hpmp_rdmacm->rdma_connect(id, NULL);
        if (err) {
                hpmp_printf("rdma_connect() failed");
                return NULL;
        }
t1 = MPI_Wtime();

retry3:
        err = 
hpmp_rdmacm->rdma_get_cm_event(hpmp_rdmacm->connect_cm_channel, &event);
        if (err) {
                if (errno == EINTR) goto retry3;
                hpmp_printf("rdma_get_cm_event() failed");
                return NULL;
        }

        if (event->event != RDMA_CM_EVENT_ESTABLISHED) {
                hpmp_printf("rdma_get_cm_event() unexpected event (%d vs 
%d)"
                                "while connecting to %d\n",
                                event->event, RDMA_CM_EVENT_ESTABLISHED,
                                port);
                return NULL;
        }

t2 =  MPI_Wtime();
fprintf(stderr, "CONNECTION ESTABLISHED ON CONNECT %lf\n", t2-t1);
        hpmp_rdmacm->rdma_ack_cm_event(event);



I get output such as:

[ 1] CONNECTION ESTABLISHED ON CONNECT 0.001447
[ 9] CONNECTION ESTABLISHED ON CONNECT 6.145778
[ 6] CONNECTION ESTABLISHED ON CONNECT 5.233660
[ 0] CONNECTION ESTABLISHED ON CONNECT 0.001343
[ 6] CONNECTION ESTABLISHED ON CONNECT 0.001155
[ 7] CONNECTION ESTABLISHED ON CONNECT 4.517944
[11] CONNECTION ESTABLISHED ON CONNECT 0.001445
[ 3] CONNECTION ESTABLISHED ON CONNECT 0.001558
[ 7] CONNECTION ESTABLISHED ON CONNECT 0.001627
[ 5] CONNECTION ESTABLISHED ON CONNECT 6.145470
[ 2] CONNECTION ESTABLISHED ON CONNECT 5.657639
[ 9] CONNECTION ESTABLISHED ON CONNECT 0.001602
[10] CONNECTION ESTABLISHED ON CONNECT 6.188743
[ 1] CONNECTION ESTABLISHED ON CONNECT 0.001500
[ 6] CONNECTION ESTABLISHED ON CONNECT 0.001061
[ 1] CONNECTION ESTABLISHED ON CONNECT 0.001183
[11] CONNECTION ESTABLISHED ON CONNECT 0.001213
[ 5] CONNECTION ESTABLISHED ON CONNECT 0.210666









From:   "Hefty, Sean" <[email protected]>
To:     David Solt/Dallas/IBM@IBMUS, 
Cc:     "[email protected]" <[email protected]>
Date:   05/06/2014 03:48 PM
Subject:        RE: Announcing IBM Platform MPI 9.1.2.1 FixPack



> I am trying to add rdmacm support to Platform MPI.   I noticed that the
> performance on our test cluster was very poor for creating connections.
> For 12 processes on 12 hosts to create n^^2 connections takes about 12
> seconds.   I also discovered that if I create some TCP sockets and use
> those to ensure that only one process at a time is calling 
rdmacm_connect
> to any target at a time, that the performance changes dramatically and
> that I can then connected the 12 processes very quickly (didn't measure
> exactly, but similar to our old rdma code).    The order in which I am
> connecting processes avoids flooding a single target with many
> rdmacm_connects at once, but it is difficult to avoid the case where 2
> processes call dmacm_connect to the same target at roughly the same time
> except when using my extra TCP socket connections.   I haven't played 
with
> MPICH code yet to see if they have the same issue, but will try that 
next.
> 
> 
> Our test cluster is a bit old:
> 
> 09:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0
> 5GT/s - IB QDR / 10GigE] (rev b0)
> 
> Is this a known problem?  Are you aware of any issues that would shed 
some
> light on this?

This is the first I've heard of slow connect times.  Are you sure that the 
time is coming from rdma_connect, versus route or address resolution?



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to