> On Apr 16, 2015, at 10:32 AM, Erez Shitrit <[email protected]> wrote: > > On Thu, Apr 16, 2015 at 4:29 PM, Doug Ledford <[email protected]> wrote: >> >>> On Apr 15, 2015, at 9:29 AM, Erez Shitrit <[email protected]> wrote: >>> >>> Hi Doug >>> >>> (now with "reply to all" :) ) >>> >>> On Wed, Apr 15, 2015 at 3:08 PM, Doug Ledford <[email protected]> wrote: >>>> >>>>> On Apr 2, 2015, at 6:39 AM, Or Gerlitz <[email protected]> wrote: >>>>> >>>>> From: Erez Shitrit <[email protected]> >>>>> >>>>> As the result of a completion error the QP can moved to SQE state by >>>>> the hardware. Since it's not the Error state, there are no flushes >>>>> and hence the driver doesn't know about that. >>>>> >>>>> The fix creates a task that after completion with error which is not a >>>>> flush tracks the QP state and if it is in SQE state moves it back to RTS. >>>> >>>> I assume you found this particular issue while tracking down the cause of >>>> the problem that you fixed in patch #6? Otherwise, this is a “never >>>> should happen” situation, yes? >>>> >>> >>> Not only. it happened in the case that described in patch#6 and also >>> in some other errors that caused the FW/HW to decide move the UD QO to >>> SQE state (in the category of "bad thing happened.." like damaged >>> WQE, FW problem, etc.) >>> The patch originally was written to solve such cases which we really >>> saw, at testing, customers etc. and not for patch#6 >>> >>>>> Signed-off-by: Erez Shitrit <[email protected]> >>>>> Signed-off-by: Or Gerlitz <[email protected]> >>>>> --- >>>>> drivers/infiniband/ulp/ipoib/ipoib.h | 5 +++ >>>>> drivers/infiniband/ulp/ipoib/ipoib_ib.c | 59 >>>>> ++++++++++++++++++++++++++++++- >>>>> 2 files changed, 63 insertions(+), 1 deletions(-) >>>>> >>>>> diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h >>>>> b/drivers/infiniband/ulp/ipoib/ipoib.h >>>>> index 769044c..2703d9a 100644 >>>>> --- a/drivers/infiniband/ulp/ipoib/ipoib.h >>>>> +++ b/drivers/infiniband/ulp/ipoib/ipoib.h >>>>> @@ -299,6 +299,11 @@ struct ipoib_neigh_table { >>>>> struct completion deleted; >>>>> }; >>>>> >>>>> +struct ipoib_qp_state_validate { >>>>> + struct work_struct work; >>>>> + struct ipoib_dev_priv *priv; >>>>> +}; >>>> >>>> Why did you choose to do an allocated work struct? All of the other tasks >>>> we have use a static work struct that’s part of the ipoib private dev, so >>>> this is a bit out of the norm. In its current form, you only use this on >>>> the single UD queue pair that the dev uses, and not ever on any connected >>>> mode queue pairs, so it could be static. Or were you looking at the >>>> possibility of making the task work on the CM queue pairs and so left it >>>> allocated? And if that, why not include a reference to the broken queue >>>> pair in the work struct instead of always using priv->qp? >>>> >>> >>> It looks (at least for me) more elegant to create "on-the-fly" work >>> create/queue and delete at the end (BTW, I saw it many times, other >>> than ipoib). >>> There is no need for that in the CM mode, there the connection and the >>> QP are destroyed, and new one will be created, only in UD mode the QP >>> is forever. >> >> I knew it wasn’t needed in CM mode currently, but I didn’t know if there was >> a plan in your mind to change the way CM worked so that it tried to rescue a >> queue pair instead of destroying and starting over. >> >> However, I really think we need to be consistent in this driver. If we >> allocate our work struct in one and only one place, then we run into the >> problem where someone working on the driver in the future has to try and >> remember “Gee, is this the one function that allocates our work struct and >> we must then free it to avoid leaking memory, or was that a different >> function?” I would overlook that if we had a plan that involved this >> function ever operating on anything other than the UD QP, in which case you >> would have to allocate a struct to designate the QP in question, and you >> would have to allocate a struct per instance because it would technically be >> possible to have more than one QP in an SQE state at once. But if that’s >> not the plan, then please rework this patch to add the work struct to the >> ipoib private dev like we do with all the other work structs. I’d like to >> swap this one out of my tree for your new one ASAP. >> > > There is a plan to have more than one UD QP per device, in order to > have multi-queue interfaces (not according to the CM QP but with the > same idea that you said, many objects that can use different work > each). > So, i think that this way to allocate work object is still reasonable.
OK, then yes, that changes things. The patch as it stands is good then.
>
>>>
>>>>> /*
>>>>> * Device private locking: network stack tx_lock protects members used
>>>>> * in TX fast path, lock protects everything else. lock nests inside
>>>>> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
>>>>> b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
>>>>> index 29b376d..63b92cb 100644
>>>>> --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
>>>>> +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
>>>>> @@ -327,6 +327,51 @@ static void ipoib_dma_unmap_tx(struct ib_device *ca,
>>>>> }
>>>>> }
>>>>>
>>>>> +/*
>>>>> + * As the result of a completion error the QP Can be transferred to SQE
>>>>> states.
>>>>> + * The function checks if the (send)QP is in SQE state and
>>>>> + * moves it back to RTS state, that in order to have it functional again.
>>>>> + */
>>>>> +static void ipoib_qp_state_validate_work(struct work_struct *work)
>>>>> +{
>>>>> + struct ipoib_qp_state_validate *qp_work =
>>>>> + container_of(work, struct ipoib_qp_state_validate, work);
>>>>> +
>>>>> + struct ipoib_dev_priv *priv = qp_work->priv;
>>>>> + struct ib_qp_attr qp_attr;
>>>>> + struct ib_qp_init_attr query_init_attr;
>>>>> + int ret;
>>>>> +
>>>>> + ret = ib_query_qp(priv->qp, &qp_attr, IB_QP_STATE,
>>>>> &query_init_attr);
>>>>> + if (ret) {
>>>>> + ipoib_warn(priv, "%s: Failed to query QP ret: %d\n",
>>>>> + __func__, ret);
>>>>> + goto free_res;
>>>>> + }
>>>>> + pr_info("%s: QP: 0x%x is in state: %d\n",
>>>>> + __func__, priv->qp->qp_num, qp_attr.qp_state);
>>>>> +
>>>>> + /* currently support only in SQE->RTS transition*/
>>>>> + if (qp_attr.qp_state == IB_QPS_SQE) {
>>>>> + qp_attr.qp_state = IB_QPS_RTS;
>>>>> +
>>>>> + ret = ib_modify_qp(priv->qp, &qp_attr, IB_QP_STATE);
>>>>> + if (ret) {
>>>>> + pr_warn("failed(%d) modify QP:0x%x SQE->RTS\n",
>>>>> + ret, priv->qp->qp_num);
>>>>> + goto free_res;
>>>>> + }
>>>>> + pr_info("%s: QP: 0x%x moved from IB_QPS_SQE to
>>>>> IB_QPS_RTS\n",
>>>>> + __func__, priv->qp->qp_num);
>>>>> + } else {
>>>>> + pr_warn("QP (%d) will stay in state: %d\n",
>>>>> + priv->qp->qp_num, qp_attr.qp_state);
>>>>> + }
>>>>> +
>>>>> +free_res:
>>>>> + kfree(qp_work);
>>>>> +}
>>>>> +
>>>>> static void ipoib_ib_handle_tx_wc(struct net_device *dev, struct ib_wc
>>>>> *wc)
>>>>> {
>>>>> struct ipoib_dev_priv *priv = netdev_priv(dev);
>>>>> @@ -358,10 +403,22 @@ static void ipoib_ib_handle_tx_wc(struct net_device
>>>>> *dev, struct ib_wc *wc)
>>>>> netif_wake_queue(dev);
>>>>>
>>>>> if (wc->status != IB_WC_SUCCESS &&
>>>>> - wc->status != IB_WC_WR_FLUSH_ERR)
>>>>> + wc->status != IB_WC_WR_FLUSH_ERR) {
>>>>> + struct ipoib_qp_state_validate *qp_work;
>>>>> ipoib_warn(priv, "failed send event "
>>>>> "(status=%d, wrid=%d vend_err %x)\n",
>>>>> wc->status, wr_id, wc->vendor_err);
>>>>> + qp_work = kzalloc(sizeof(*qp_work), GFP_ATOMIC);
>>>>> + if (!qp_work) {
>>>>> + ipoib_warn(priv, "%s Failed alloc
>>>>> ipoib_qp_state_validate for qp: 0x%x\n",
>>>>> + __func__, priv->qp->qp_num);
>>>>> + return;
>>>>> + }
>>>>> +
>>>>> + INIT_WORK(&qp_work->work, ipoib_qp_state_validate_work);
>>>>> + qp_work->priv = priv;
>>>>> + queue_work(priv->wq, &qp_work->work);
>>>>> + }
>>>>> }
>>>>>
>>>>> static int poll_tx(struct ipoib_dev_priv *priv)
>>>>> --
>>>>> 1.7.1
>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>>>> the body of a message to [email protected]
>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>
>>>> —
>>>> Doug Ledford <[email protected]>
>>>> GPG Key ID: 0E572FDD
>>>>
>>>>
>>>>
>>>>
>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>> —
>> Doug Ledford <[email protected]>
>> GPG Key ID: 0E572FDD
—
Doug Ledford <[email protected]>
GPG Key ID: 0E572FDD
signature.asc
Description: Message signed with OpenPGP using GPGMail
