On Thu, Apr 16, 2015 at 4:29 PM, Doug Ledford <[email protected]> wrote:
>
>> On Apr 15, 2015, at 9:29 AM, Erez Shitrit <[email protected]> wrote:
>>
>> Hi Doug
>>
>> (now with "reply to all" :) )
>>
>> On Wed, Apr 15, 2015 at 3:08 PM, Doug Ledford <[email protected]> wrote:
>>>
>>>> On Apr 2, 2015, at 6:39 AM, Or Gerlitz <[email protected]> wrote:
>>>>
>>>> From: Erez Shitrit <[email protected]>
>>>>
>>>> As the result of a completion error the QP can moved to SQE state by
>>>> the hardware. Since it's not the Error state, there are no flushes
>>>> and hence the driver doesn't know about that.
>>>>
>>>> The fix creates a task that after completion with error which is not a
>>>> flush tracks the QP state and if it is in SQE state moves it back to RTS.
>>>
>>> I assume you found this particular issue while tracking down the cause of
>>> the problem that you fixed in patch #6? Otherwise, this is a “never should
>>> happen” situation, yes?
>>>
>>
>> Not only. it happened in the case that described in patch#6 and also
>> in some other errors that caused the FW/HW to decide move the UD QO to
>> SQE state (in the category of "bad thing happened.." like damaged
>> WQE, FW problem, etc.)
>> The patch originally was written to solve such cases which we really
>> saw, at testing, customers etc. and not for patch#6
>>
>>>> Signed-off-by: Erez Shitrit <[email protected]>
>>>> Signed-off-by: Or Gerlitz <[email protected]>
>>>> ---
>>>> drivers/infiniband/ulp/ipoib/ipoib.h | 5 +++
>>>> drivers/infiniband/ulp/ipoib/ipoib_ib.c | 59
>>>> ++++++++++++++++++++++++++++++-
>>>> 2 files changed, 63 insertions(+), 1 deletions(-)
>>>>
>>>> diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h
>>>> b/drivers/infiniband/ulp/ipoib/ipoib.h
>>>> index 769044c..2703d9a 100644
>>>> --- a/drivers/infiniband/ulp/ipoib/ipoib.h
>>>> +++ b/drivers/infiniband/ulp/ipoib/ipoib.h
>>>> @@ -299,6 +299,11 @@ struct ipoib_neigh_table {
>>>> struct completion deleted;
>>>> };
>>>>
>>>> +struct ipoib_qp_state_validate {
>>>> + struct work_struct work;
>>>> + struct ipoib_dev_priv *priv;
>>>> +};
>>>
>>> Why did you choose to do an allocated work struct? All of the other tasks
>>> we have use a static work struct that’s part of the ipoib private dev, so
>>> this is a bit out of the norm. In its current form, you only use this on
>>> the single UD queue pair that the dev uses, and not ever on any connected
>>> mode queue pairs, so it could be static. Or were you looking at the
>>> possibility of making the task work on the CM queue pairs and so left it
>>> allocated? And if that, why not include a reference to the broken queue
>>> pair in the work struct instead of always using priv->qp?
>>>
>>
>> It looks (at least for me) more elegant to create "on-the-fly" work
>> create/queue and delete at the end (BTW, I saw it many times, other
>> than ipoib).
>> There is no need for that in the CM mode, there the connection and the
>> QP are destroyed, and new one will be created, only in UD mode the QP
>> is forever.
>
> I knew it wasn’t needed in CM mode currently, but I didn’t know if there was
> a plan in your mind to change the way CM worked so that it tried to rescue a
> queue pair instead of destroying and starting over.
>
> However, I really think we need to be consistent in this driver. If we
> allocate our work struct in one and only one place, then we run into the
> problem where someone working on the driver in the future has to try and
> remember “Gee, is this the one function that allocates our work struct and we
> must then free it to avoid leaking memory, or was that a different function?”
> I would overlook that if we had a plan that involved this function ever
> operating on anything other than the UD QP, in which case you would have to
> allocate a struct to designate the QP in question, and you would have to
> allocate a struct per instance because it would technically be possible to
> have more than one QP in an SQE state at once. But if that’s not the plan,
> then please rework this patch to add the work struct to the ipoib private dev
> like we do with all the other work structs. I’d like to swap this one out of
> my tree for your new one ASAP.
>
There is a plan to have more than one UD QP per device, in order to
have multi-queue interfaces (not according to the CM QP but with the
same idea that you said, many objects that can use different work
each).
So, i think that this way to allocate work object is still reasonable.
>>
>>>> /*
>>>> * Device private locking: network stack tx_lock protects members used
>>>> * in TX fast path, lock protects everything else. lock nests inside
>>>> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
>>>> b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
>>>> index 29b376d..63b92cb 100644
>>>> --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
>>>> +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
>>>> @@ -327,6 +327,51 @@ static void ipoib_dma_unmap_tx(struct ib_device *ca,
>>>> }
>>>> }
>>>>
>>>> +/*
>>>> + * As the result of a completion error the QP Can be transferred to SQE
>>>> states.
>>>> + * The function checks if the (send)QP is in SQE state and
>>>> + * moves it back to RTS state, that in order to have it functional again.
>>>> + */
>>>> +static void ipoib_qp_state_validate_work(struct work_struct *work)
>>>> +{
>>>> + struct ipoib_qp_state_validate *qp_work =
>>>> + container_of(work, struct ipoib_qp_state_validate, work);
>>>> +
>>>> + struct ipoib_dev_priv *priv = qp_work->priv;
>>>> + struct ib_qp_attr qp_attr;
>>>> + struct ib_qp_init_attr query_init_attr;
>>>> + int ret;
>>>> +
>>>> + ret = ib_query_qp(priv->qp, &qp_attr, IB_QP_STATE, &query_init_attr);
>>>> + if (ret) {
>>>> + ipoib_warn(priv, "%s: Failed to query QP ret: %d\n",
>>>> + __func__, ret);
>>>> + goto free_res;
>>>> + }
>>>> + pr_info("%s: QP: 0x%x is in state: %d\n",
>>>> + __func__, priv->qp->qp_num, qp_attr.qp_state);
>>>> +
>>>> + /* currently support only in SQE->RTS transition*/
>>>> + if (qp_attr.qp_state == IB_QPS_SQE) {
>>>> + qp_attr.qp_state = IB_QPS_RTS;
>>>> +
>>>> + ret = ib_modify_qp(priv->qp, &qp_attr, IB_QP_STATE);
>>>> + if (ret) {
>>>> + pr_warn("failed(%d) modify QP:0x%x SQE->RTS\n",
>>>> + ret, priv->qp->qp_num);
>>>> + goto free_res;
>>>> + }
>>>> + pr_info("%s: QP: 0x%x moved from IB_QPS_SQE to IB_QPS_RTS\n",
>>>> + __func__, priv->qp->qp_num);
>>>> + } else {
>>>> + pr_warn("QP (%d) will stay in state: %d\n",
>>>> + priv->qp->qp_num, qp_attr.qp_state);
>>>> + }
>>>> +
>>>> +free_res:
>>>> + kfree(qp_work);
>>>> +}
>>>> +
>>>> static void ipoib_ib_handle_tx_wc(struct net_device *dev, struct ib_wc *wc)
>>>> {
>>>> struct ipoib_dev_priv *priv = netdev_priv(dev);
>>>> @@ -358,10 +403,22 @@ static void ipoib_ib_handle_tx_wc(struct net_device
>>>> *dev, struct ib_wc *wc)
>>>> netif_wake_queue(dev);
>>>>
>>>> if (wc->status != IB_WC_SUCCESS &&
>>>> - wc->status != IB_WC_WR_FLUSH_ERR)
>>>> + wc->status != IB_WC_WR_FLUSH_ERR) {
>>>> + struct ipoib_qp_state_validate *qp_work;
>>>> ipoib_warn(priv, "failed send event "
>>>> "(status=%d, wrid=%d vend_err %x)\n",
>>>> wc->status, wr_id, wc->vendor_err);
>>>> + qp_work = kzalloc(sizeof(*qp_work), GFP_ATOMIC);
>>>> + if (!qp_work) {
>>>> + ipoib_warn(priv, "%s Failed alloc
>>>> ipoib_qp_state_validate for qp: 0x%x\n",
>>>> + __func__, priv->qp->qp_num);
>>>> + return;
>>>> + }
>>>> +
>>>> + INIT_WORK(&qp_work->work, ipoib_qp_state_validate_work);
>>>> + qp_work->priv = priv;
>>>> + queue_work(priv->wq, &qp_work->work);
>>>> + }
>>>> }
>>>>
>>>> static int poll_tx(struct ipoib_dev_priv *priv)
>>>> --
>>>> 1.7.1
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>>> the body of a message to [email protected]
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>> —
>>> Doug Ledford <[email protected]>
>>> GPG Key ID: 0E572FDD
>>>
>>>
>>>
>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> —
> Doug Ledford <[email protected]>
> GPG Key ID: 0E572FDD
>
>
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html