> still, I am not sure to be with you, the mads used by the CM aren't
> reliable, correct?
> so I don't see why/how a mad containing e.g junk DLID completes with
> error...

CM mads aren't reliable, however they are retried.  If a CM REQ does not 
receive a response after so many retries (usually 15), the REQ fails (status is 
timeout).  The mad layer reports the timeout to the cm module.  With snooping 
in place, a user will be notified that a mad send has failed and be given a 
copy of the mad.

At a higher level, this would be one usage model:

1. App calls rdma_getaddrinfo()
2. The librdmacm contacts the ibacm for path record data.
3. ibacm returns a path record.  The path record _may_ have come from cached 
data.
4. The librdmacm tries to establish a connection.
5. The kernel ib_cm module issues REQ.
6. The ib_mad module retries the REQ until it times out.
7. The mad timeout is reported to any users wishing to capture errors.

In this example, the ibacm service would be registered and receive a copy of 
the failed REQ.  The ibacm can look at the data in the REQ, see if it if has 
cached path record data which matches, and remove the cached data if so.  If 
the REQ data cannot be found (for example, someone sent a REQ with a junk 
DLID), it simply discards the captured mad.

8. The librdmacm will see a connection failure.
9. The librdmacm can request a new path from the ibacm and retry.

- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to