On Thu, Apr 16, 2015 at 4:29 PM, Doug Ledford <[email protected]> wrote:
>
>> On Apr 15, 2015, at 9:29 AM, Erez Shitrit <[email protected]> wrote:
>>
>> Hi Doug
>>
>> (now with "reply to all" :) )
>>
>> On Wed, Apr 15, 2015 at 3:08 PM, Doug Ledford <[email protected]> wrote:
>>>
>>>> On Apr 2, 2015, at 6:39 AM, Or Gerlitz <[email protected]> wrote:
>>>>
>>>> From: Erez Shitrit <[email protected]>
>>>>
>>>> As the result of a completion error the QP can moved to SQE state by
>>>> the hardware. Since it's not the Error state, there are no flushes
>>>> and hence the driver doesn't know about that.
>>>>
>>>> The fix creates a task that after completion with error which is not a
>>>> flush tracks the QP state and if it is in SQE state moves it back to RTS.
>>>
>>> I assume you found this particular issue while tracking down the cause of 
>>> the problem that you fixed in patch #6?  Otherwise, this is a “never should 
>>> happen” situation, yes?
>>>
>>
>> Not only. it happened in the case that described in patch#6 and also
>> in some other errors that caused the FW/HW to decide move the UD QO to
>> SQE state (in the category of "bad thing happened.."  like damaged
>> WQE, FW problem, etc.)
>> The patch originally was written to solve such cases which we really
>> saw, at testing, customers etc. and not for patch#6
>>
>>>> Signed-off-by: Erez Shitrit <[email protected]>
>>>> Signed-off-by: Or Gerlitz <[email protected]>
>>>> ---
>>>> drivers/infiniband/ulp/ipoib/ipoib.h    |    5 +++
>>>> drivers/infiniband/ulp/ipoib/ipoib_ib.c |   59 
>>>> ++++++++++++++++++++++++++++++-
>>>> 2 files changed, 63 insertions(+), 1 deletions(-)
>>>>
>>>> diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h 
>>>> b/drivers/infiniband/ulp/ipoib/ipoib.h
>>>> index 769044c..2703d9a 100644
>>>> --- a/drivers/infiniband/ulp/ipoib/ipoib.h
>>>> +++ b/drivers/infiniband/ulp/ipoib/ipoib.h
>>>> @@ -299,6 +299,11 @@ struct ipoib_neigh_table {
>>>>      struct completion               deleted;
>>>> };
>>>>
>>>> +struct ipoib_qp_state_validate {
>>>> +     struct work_struct work;
>>>> +     struct ipoib_dev_priv   *priv;
>>>> +};
>>>
>>> Why did you choose to do an allocated work struct?  All of the other tasks 
>>> we have use a static work struct that’s part of the ipoib private dev, so 
>>> this is a bit out of the norm.  In its current form, you only use this on 
>>> the single UD queue pair that the dev uses, and not ever on any connected 
>>> mode queue pairs, so it could be static.  Or were you looking at the 
>>> possibility of making the task work on the CM queue pairs and so left it 
>>> allocated?  And if that, why not include a reference to the broken queue 
>>> pair in the work struct instead of always using priv->qp?
>>>
>>
>> It looks (at least for me) more elegant to create "on-the-fly" work
>> create/queue and delete at the end (BTW, I saw it many times, other
>> than ipoib).
>> There is no need for that in the CM mode, there the connection and the
>> QP are destroyed, and new one will be created, only in UD mode the QP
>> is forever.
>
> I knew it wasn’t needed in CM mode currently, but I didn’t know if there was 
> a plan in your mind to change the way CM worked so that it tried to rescue a 
> queue pair instead of destroying and starting over.
>
> However, I really think we need to be consistent in this driver.  If we 
> allocate our work struct in one and only one place, then we run into the 
> problem where someone working on the driver in the future has to try and 
> remember “Gee, is this the one function that allocates our work struct and we 
> must then free it to avoid leaking memory, or was that a different function?” 
>  I would overlook that if we had a plan that involved this function ever 
> operating on anything other than the UD QP, in which case you would have to 
> allocate a struct to designate the QP in question, and you would have to 
> allocate a struct per instance because it would technically be possible to 
> have more than one QP in an SQE state at once.  But if that’s not the plan, 
> then please rework this patch to add the work struct to the ipoib private dev 
> like we do with all the other work structs.  I’d like to swap this one out of 
> my tree for your new one ASAP.
>

There is a plan to have more than one UD QP per device, in order to
have multi-queue interfaces (not according to the CM QP but with the
same idea that you said, many objects that can use different work
each).
So, i think that this way to allocate work object is still reasonable.

>>
>>>> /*
>>>> * Device private locking: network stack tx_lock protects members used
>>>> * in TX fast path, lock protects everything else.  lock nests inside
>>>> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c 
>>>> b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
>>>> index 29b376d..63b92cb 100644
>>>> --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
>>>> +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
>>>> @@ -327,6 +327,51 @@ static void ipoib_dma_unmap_tx(struct ib_device *ca,
>>>>      }
>>>> }
>>>>
>>>> +/*
>>>> + * As the result of a completion error the QP Can be transferred to SQE 
>>>> states.
>>>> + * The function checks if the (send)QP is in SQE state and
>>>> + * moves it back to RTS state, that in order to have it functional again.
>>>> + */
>>>> +static void ipoib_qp_state_validate_work(struct work_struct *work)
>>>> +{
>>>> +     struct ipoib_qp_state_validate *qp_work =
>>>> +             container_of(work, struct ipoib_qp_state_validate, work);
>>>> +
>>>> +     struct ipoib_dev_priv *priv = qp_work->priv;
>>>> +     struct ib_qp_attr qp_attr;
>>>> +     struct ib_qp_init_attr query_init_attr;
>>>> +     int ret;
>>>> +
>>>> +     ret = ib_query_qp(priv->qp, &qp_attr, IB_QP_STATE, &query_init_attr);
>>>> +     if (ret) {
>>>> +             ipoib_warn(priv, "%s: Failed to query QP ret: %d\n",
>>>> +                        __func__, ret);
>>>> +             goto free_res;
>>>> +     }
>>>> +     pr_info("%s: QP: 0x%x is in state: %d\n",
>>>> +             __func__, priv->qp->qp_num, qp_attr.qp_state);
>>>> +
>>>> +     /* currently support only in SQE->RTS transition*/
>>>> +     if (qp_attr.qp_state == IB_QPS_SQE) {
>>>> +             qp_attr.qp_state = IB_QPS_RTS;
>>>> +
>>>> +             ret = ib_modify_qp(priv->qp, &qp_attr, IB_QP_STATE);
>>>> +             if (ret) {
>>>> +                     pr_warn("failed(%d) modify QP:0x%x SQE->RTS\n",
>>>> +                             ret, priv->qp->qp_num);
>>>> +                     goto free_res;
>>>> +             }
>>>> +             pr_info("%s: QP: 0x%x moved from IB_QPS_SQE to IB_QPS_RTS\n",
>>>> +                     __func__, priv->qp->qp_num);
>>>> +     } else {
>>>> +             pr_warn("QP (%d) will stay in state: %d\n",
>>>> +                     priv->qp->qp_num, qp_attr.qp_state);
>>>> +     }
>>>> +
>>>> +free_res:
>>>> +     kfree(qp_work);
>>>> +}
>>>> +
>>>> static void ipoib_ib_handle_tx_wc(struct net_device *dev, struct ib_wc *wc)
>>>> {
>>>>      struct ipoib_dev_priv *priv = netdev_priv(dev);
>>>> @@ -358,10 +403,22 @@ static void ipoib_ib_handle_tx_wc(struct net_device 
>>>> *dev, struct ib_wc *wc)
>>>>              netif_wake_queue(dev);
>>>>
>>>>      if (wc->status != IB_WC_SUCCESS &&
>>>> -         wc->status != IB_WC_WR_FLUSH_ERR)
>>>> +         wc->status != IB_WC_WR_FLUSH_ERR) {
>>>> +             struct ipoib_qp_state_validate *qp_work;
>>>>              ipoib_warn(priv, "failed send event "
>>>>                         "(status=%d, wrid=%d vend_err %x)\n",
>>>>                         wc->status, wr_id, wc->vendor_err);
>>>> +             qp_work = kzalloc(sizeof(*qp_work), GFP_ATOMIC);
>>>> +             if (!qp_work) {
>>>> +                     ipoib_warn(priv, "%s Failed alloc 
>>>> ipoib_qp_state_validate for qp: 0x%x\n",
>>>> +                                __func__, priv->qp->qp_num);
>>>> +                     return;
>>>> +             }
>>>> +
>>>> +             INIT_WORK(&qp_work->work, ipoib_qp_state_validate_work);
>>>> +             qp_work->priv = priv;
>>>> +             queue_work(priv->wq, &qp_work->work);
>>>> +     }
>>>> }
>>>>
>>>> static int poll_tx(struct ipoib_dev_priv *priv)
>>>> --
>>>> 1.7.1
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>>> the body of a message to [email protected]
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>> —
>>> Doug Ledford <[email protected]>
>>>        GPG Key ID: 0E572FDD
>>>
>>>
>>>
>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to [email protected]
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> —
> Doug Ledford <[email protected]>
>         GPG Key ID: 0E572FDD
>
>
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to