We ran into this problem when testing rping over Intel/QLogic hardware: [root@rdmaperf3 ~]# rping -s -a 172.31.2.103 -v wait for CONNECTED state 10 connect error -1 cma event RDMA_CM_EVENT_REJECTED, error 28 [root@rdmaperf3 ~]#
[root@rdmaperf8 ~]# rping -c -a 172.31.2.103 -v -C 5
cma event RDMA_CM_EVENT_CONNECT_ERROR, error -1
wait for CONNECTED state 4
connect error -1
[root@rdmaperf8 ~]#
Turns out this is because of a couple things:
1) rping, on the client side, clears the conn_params for the newly to be
attempted connection, then sets:
conn_param.responder_resources = 1;
conn_param.initiator_depth = 1;
conn_param.retry_count = 10;
On the accept side, rping clears the conn_params and then sets just the
responder_resources and initiator_depth, without even checking the
incoming requested conn_param values from the incoming cm_id. So, OK,
you can get away with that since this is a simple test program, but
still not "best programming practices". However, the important part
here is the retry_count of 10. That won't work on Intel/QLogic hardware.
2) the qib driver enforces a maximum of 7 for retry_count. I don't see
anything in the spec that specifies a maximum for this entry, and in
particular I know it doesn't call out for 7 to mean infinite retries
like it does for rnr_retry_count.
I don't think the spec really cares how we solve this, and I don't think
there is a hard limit of 7 for the retry_count like the qib driver
enforces. On the other hand, the spec doesn't call out a limit on the
retry_count but I would assume each driver has the option to implement
their own "reasonable, implementation defined" limit in a case like this.
So, do we make qib more liberal in its acceptance of retry_count or do
we fix rping to use a smaller number? Matters not to me...
--
Doug Ledford <[email protected]>
GPG KeyID: 0E572FDD
http://people.redhat.com/dledford
Infiniband specific RPMs available at
http://people.redhat.com/dledford/Infiniband
signature.asc
Description: OpenPGP digital signature
