Re: how to re-use a QP for a new connection

Steve Wise Mon, 23 Jun 2014 14:13:07 -0700

On 6/23/2014 12:31 PM, Chuck Lever wrote:

On Jun 23, 2014, at 1:25 PM, Hefty, Sean <[email protected]> wrote:

For the record, with both mlx4 and cxgb4, we see FRMRs left valid
after a FAST_REG_MR is flushed during a connection loss. More study
needed, obviously.

Is the bug that this type of WR completes in error, but actually exposed the 
memory region?

We haven’t checked if the MR is exposed; hadn’t thought of that!

I don't think this is a bug. It is a race where HW is in the process offast-registering the memory at the time the QP is moved out of RTScausing all pending work requests to get FLUSHED. I looked at both theIBTA IB and IETF iWARP Verbs specs, and neither state explicitly whatFLUSHED status means. They both say "at the the time the QP was movedto ERROR the work request was not complete". That's doesn't indicatethat the work request was canceled or didn't actually complete. Atleast that's how I read it. Irregardless, the chelsio hardware behavesthis way. And apparently the mlx hardware does too.

Anyway, for cxgb4 at least, the FRMR can be left in the valid state.The correct procedure, in the case of a fast-reg wr completing asFLUSHED is to dereg the MR if you want to ensure the region is invalidated.

What we do know is that a subsequent LOCAL_INVALIDATE using the rkey that
should work (if FAST_REG_MR had indeed never been done) fails in some cases.
With mlx4, the LINV completes with IB_WC_MW_BIND_ERR. Steve can provide
more detail about the exact failure mode with cxgb4.


cxgb4 completes with IB_WC_LOC_ACCESS_ERR.

Steve.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: how to re-use a QP for a new connection

Reply via email to