[PATCH v1] IB/cma: Use inner P_Key to determine netdev

2015-10-20 Thread Haggai Eran
When discussing the patches to demux ids in rdma_cm instead of ib_cm, it
was decided that it is best to use the P_Key value in the packet headers.
However, the mlx5 and ipath drivers are currently unable to send correct
P_Key values in GMP headers. They always send using a single P_Key that is
set during the GSI QP initialization.

Change the rdma_cm code to look at the P_Key value that is part of the
packet payload as a workaround. Once the drivers are fixed this patch can
be reverted.

Fixes: 4c21b5bcef73 ("IB/cma: Add net_dev and private data checks to
RDMA CM")
Signed-off-by: Haggai Eran 
---
Changes from v0:
- improve commit message

 drivers/infiniband/core/cma.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 59a2dafc8c57..e8324543e085 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -1067,14 +1067,14 @@ static int cma_save_req_info(const struct ib_cm_event 
*ib_event,
   sizeof(req->local_gid));
req->has_gid= true;
req->service_id = req_param->primary_path->service_id;
-   req->pkey   = req_param->bth_pkey;
+   req->pkey   = be16_to_cpu(req_param->primary_path->pkey);
break;
case IB_CM_SIDR_REQ_RECEIVED:
req->device = sidr_param->listen_id->device;
req->port   = sidr_param->port;
req->has_gid= false;
req->service_id = sidr_param->service_id;
-   req->pkey   = sidr_param->bth_pkey;
+   req->pkey   = sidr_param->pkey;
break;
default:
return -EINVAL;
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1] IB/cma: Use inner P_Key to determine netdev

2015-10-20 Thread Doug Ledford
On 10/20/2015 02:53 AM, Haggai Eran wrote:
> When discussing the patches to demux ids in rdma_cm instead of ib_cm, it
> was decided that it is best to use the P_Key value in the packet headers.
> However, the mlx5 and ipath drivers are currently unable to send correct
> P_Key values in GMP headers. They always send using a single P_Key that is
> set during the GSI QP initialization.
> 
> Change the rdma_cm code to look at the P_Key value that is part of the
> packet payload as a workaround. Once the drivers are fixed this patch can
> be reverted.
> 
> Fixes: 4c21b5bcef73 ("IB/cma: Add net_dev and private data checks to
> RDMA CM")
> Signed-off-by: Haggai Eran 
> ---
> Changes from v0:
> - improve commit message
> 
>  drivers/infiniband/core/cma.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> index 59a2dafc8c57..e8324543e085 100644
> --- a/drivers/infiniband/core/cma.c
> +++ b/drivers/infiniband/core/cma.c
> @@ -1067,14 +1067,14 @@ static int cma_save_req_info(const struct ib_cm_event 
> *ib_event,
>  sizeof(req->local_gid));
>   req->has_gid= true;
>   req->service_id = req_param->primary_path->service_id;
> - req->pkey   = req_param->bth_pkey;
> + req->pkey   = be16_to_cpu(req_param->primary_path->pkey);
>   break;
>   case IB_CM_SIDR_REQ_RECEIVED:
>   req->device = sidr_param->listen_id->device;
>   req->port   = sidr_param->port;
>   req->has_gid= false;
>   req->service_id = sidr_param->service_id;
> - req->pkey   = sidr_param->bth_pkey;
> + req->pkey   = sidr_param->pkey;
>   break;
>   default:
>   return -EINVAL;
> 

And, to be clear, you are looking for this to be part of 4.3-rc updates,
yes?

-- 
Doug Ledford 
  GPG KeyID: 0E572FDD




signature.asc
Description: OpenPGP digital signature


[PATCH for-next 0/5] RDMA/ocrdma: bug fixes

2015-10-20 Thread Selvin Xavier
This patch series includs fix in CQ door bell logic, check to prevent
driver crash and code cleanup.
Please apply.

Devesh Sharma (1):
  RDMA/ocrdma: Prevent CQ-Doorbell floods

Naga Irrinki (1):
  RDMA/ocrdma: Check resource ids received in Async CQE

Selvin Xavier (3):
  RDMA/ocrdma: Cleanup unused device list and rcu variables
  RDMA/ocrdma: Avoid a possible crash in ocrdma_rem_port_stats
  RDMA/ocrdma: Bump up ocrdma version number to 11.0.0.0

 drivers/infiniband/hw/ocrdma/ocrdma.h   |  3 +--
 drivers/infiniband/hw/ocrdma/ocrdma_hw.c| 30 +
 drivers/infiniband/hw/ocrdma/ocrdma_main.c  | 16 ++-
 drivers/infiniband/hw/ocrdma/ocrdma_stats.c |  2 +-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 11 +++
 5 files changed, 33 insertions(+), 29 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH for-next 1/5] RDMA/ocrdma: Cleanup unused device list and rcu variables

2015-10-20 Thread Selvin Xavier
ocrdma_dev_list is not used by the driver. So removing
the references of this variable. dev->rcu was introduced
for the ipv6 notifier for GID management. This is no longer
required as the GID management is outside the HW driver.

Signed-off-by: Padmanabh Ratnakar 
Signed-off-by: Selvin Xavier 
---
 drivers/infiniband/hw/ocrdma/ocrdma.h  |  1 -
 drivers/infiniband/hw/ocrdma/ocrdma_main.c | 16 ++--
 2 files changed, 2 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/hw/ocrdma/ocrdma.h 
b/drivers/infiniband/hw/ocrdma/ocrdma.h
index b4091ab..8cd57c7 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma.h
@@ -278,7 +278,6 @@ struct ocrdma_dev {
u32 hba_port_num;
 
struct list_head entry;
-   struct rcu_head rcu;
int id;
u64 *stag_arr;
u8 sl; /* service level */
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_main.c 
b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
index 87aa55d..cb1af0f 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_main.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
@@ -63,8 +63,6 @@ MODULE_DESCRIPTION(OCRDMA_ROCE_DRV_DESC " " 
OCRDMA_ROCE_DRV_VERSION);
 MODULE_AUTHOR("Emulex Corporation");
 MODULE_LICENSE("Dual BSD/GPL");
 
-static LIST_HEAD(ocrdma_dev_list);
-static DEFINE_SPINLOCK(ocrdma_devlist_lock);
 static DEFINE_IDR(ocrdma_dev_id);
 
 void ocrdma_get_guid(struct ocrdma_dev *dev, u8 *guid)
@@ -325,9 +323,6 @@ static struct ocrdma_dev *ocrdma_add(struct be_dev_info 
*dev_info)
for (i = 0; i < ARRAY_SIZE(ocrdma_attributes); i++)
if (device_create_file(>ibdev.dev, ocrdma_attributes[i]))
goto sysfs_err;
-   spin_lock(_devlist_lock);
-   list_add_tail_rcu(>entry, _dev_list);
-   spin_unlock(_devlist_lock);
/* Init stats */
ocrdma_add_port_stats(dev);
/* Interrupt Moderation */
@@ -356,9 +351,8 @@ idr_err:
return NULL;
 }
 
-static void ocrdma_remove_free(struct rcu_head *rcu)
+static void ocrdma_remove_free(struct ocrdma_dev *dev)
 {
-   struct ocrdma_dev *dev = container_of(rcu, struct ocrdma_dev, rcu);
 
idr_remove(_dev_id, dev->id);
kfree(dev->mbx_cmd);
@@ -375,15 +369,9 @@ static void ocrdma_remove(struct ocrdma_dev *dev)
ib_unregister_device(>ibdev);
 
ocrdma_rem_port_stats(dev);
-
-   spin_lock(_devlist_lock);
-   list_del_rcu(>entry);
-   spin_unlock(_devlist_lock);
-
ocrdma_free_resources(dev);
ocrdma_cleanup_hw(dev);
-
-   call_rcu(>rcu, ocrdma_remove_free);
+   ocrdma_remove_free(dev);
 }
 
 static int ocrdma_open(struct ocrdma_dev *dev)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH for-next 5/5] RDMA/ocrdma: Bump up ocrdma version number to 11.0.0.0

2015-10-20 Thread Selvin Xavier
Updating the version number to 11.0.0.0

Signed-off-by: Selvin Xavier 
---
 drivers/infiniband/hw/ocrdma/ocrdma.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/ocrdma/ocrdma.h 
b/drivers/infiniband/hw/ocrdma/ocrdma.h
index 8cd57c7..8f047f9 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma.h
@@ -55,7 +55,7 @@
 #include 
 #include "ocrdma_sli.h"
 
-#define OCRDMA_ROCE_DRV_VERSION "10.6.0.0"
+#define OCRDMA_ROCE_DRV_VERSION "11.0.0.0"
 
 #define OCRDMA_ROCE_DRV_DESC "Emulex OneConnect RoCE Driver"
 #define OCRDMA_NODE_DESC "Emulex OneConnect RoCE HCA"
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH for-next 4/5] RDMA/ocrdma: Prevent CQ-Doorbell floods

2015-10-20 Thread Selvin Xavier
From: Devesh Sharma 

Changing CQ-Doorbell(DB) logic to prevent DB floods, it is supposed to be
pressed only if any hw CQE is polled. If cq-arm was requested
previously then don't bother about number of hw CQEs polled and
arm the CQ.

Signed-off-by: Devesh Sharma 
Signed-off-by: Selvin Xavier 
---
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c 
b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
index 1f3affb..1c4e83d 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
@@ -2933,16 +2933,11 @@ expand_cqe:
}
 stop_cqe:
cq->getp = cur_getp;
-   if (cq->deferred_arm) {
-   ocrdma_ring_cq_db(dev, cq->id, true, cq->deferred_sol,
- polled_hw_cqes);
+   if (cq->deferred_arm || polled_hw_cqes) {
+   ocrdma_ring_cq_db(dev, cq->id, cq->deferred_arm,
+ cq->deferred_sol, polled_hw_cqes);
cq->deferred_arm = false;
cq->deferred_sol = false;
-   } else {
-   /* We need to pop the CQE. No need to arm */
-   ocrdma_ring_cq_db(dev, cq->id, false, cq->deferred_sol,
- polled_hw_cqes);
-   cq->deferred_sol = false;
}
 
return i;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH for-next 3/5] RDMA/ocrdma: Check resource ids received in Async CQE

2015-10-20 Thread Selvin Xavier
From: Naga Irrinki 

Some versions of the FW sends wrong QP or CQ IDs in the
Async CQE. Adding a check to see whether qp or cq structures
associated with the CQE is valid.

Signed-off-by: Devesh Sharma 
Signed-off-by: Selvin Xavier 
---
 drivers/infiniband/hw/ocrdma/ocrdma_hw.c | 30 ++
 1 file changed, 26 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_hw.c 
b/drivers/infiniband/hw/ocrdma/ocrdma_hw.c
index aab391a..9d99142 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_hw.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_hw.c
@@ -678,11 +678,33 @@ static void ocrdma_dispatch_ibevent(struct ocrdma_dev 
*dev,
int dev_event = 0;
int type = (cqe->valid_ae_event & OCRDMA_AE_MCQE_EVENT_TYPE_MASK) >>
OCRDMA_AE_MCQE_EVENT_TYPE_SHIFT;
+   u16 qpid = cqe->qpvalid_qpid & OCRDMA_AE_MCQE_QPID_MASK;
+   u16 cqid = cqe->cqvalid_cqid & OCRDMA_AE_MCQE_CQID_MASK;
 
-   if (cqe->qpvalid_qpid & OCRDMA_AE_MCQE_QPVALID)
-   qp = dev->qp_tbl[cqe->qpvalid_qpid & OCRDMA_AE_MCQE_QPID_MASK];
-   if (cqe->cqvalid_cqid & OCRDMA_AE_MCQE_CQVALID)
-   cq = dev->cq_tbl[cqe->cqvalid_cqid & OCRDMA_AE_MCQE_CQID_MASK];
+   /*
+* Some FW version returns wrong qp or cq ids in CQEs.
+* Checking whether the IDs are valid
+*/
+
+   if (cqe->qpvalid_qpid & OCRDMA_AE_MCQE_QPVALID) {
+   if (qpid < dev->attr.max_qp)
+   qp = dev->qp_tbl[qpid];
+   if (qp == NULL) {
+   pr_err("ocrdma%d:Async event - qpid %u is not valid\n",
+  dev->id, qpid);
+   return;
+   }
+   }
+
+   if (cqe->cqvalid_cqid & OCRDMA_AE_MCQE_CQVALID) {
+   if (cqid < dev->attr.max_cq)
+   cq = dev->cq_tbl[cqid];
+   if (cq == NULL) {
+   pr_err("ocrdma%d:Async event - cqid %u is not valid\n",
+  dev->id, cqid);
+   return;
+   }
+   }
 
memset(_evt, 0, sizeof(ib_evt));
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH for-next 2/5] RDMA/ocrdma: Avoid a possible crash in ocrdma_rem_port_stats

2015-10-20 Thread Selvin Xavier
debugfs_remove should be called before freeing the driver
stats resources to avoid any crash during ocrdma_remove.

Signed-off-by: Devesh Sharma 
Signed-off-by: Selvin Xavier 
---
 drivers/infiniband/hw/ocrdma/ocrdma_stats.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_stats.c 
b/drivers/infiniband/hw/ocrdma/ocrdma_stats.c
index 69334e2..86c303a 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_stats.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_stats.c
@@ -855,9 +855,9 @@ void ocrdma_rem_port_stats(struct ocrdma_dev *dev)
 {
if (!dev->dir)
return;
+   debugfs_remove(dev->dir);
mutex_destroy(>stats_lock);
ocrdma_release_stats_mem(dev);
-   debugfs_remove(dev->dir);
 }
 
 void ocrdma_init_debugfs(void)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1] IB/cma: Use inner P_Key to determine netdev

2015-10-20 Thread Haggai Eran
On 20/10/2015 10:20, Doug Ledford wrote:
> On 10/20/2015 02:53 AM, Haggai Eran wrote:
>> When discussing the patches to demux ids in rdma_cm instead of ib_cm, it
>> was decided that it is best to use the P_Key value in the packet headers.
>> However, the mlx5 and ipath drivers are currently unable to send correct
>> P_Key values in GMP headers. They always send using a single P_Key that is
>> set during the GSI QP initialization.
>>
>> Change the rdma_cm code to look at the P_Key value that is part of the
>> packet payload as a workaround. Once the drivers are fixed this patch can
>> be reverted.
>>
>> Fixes: 4c21b5bcef73 ("IB/cma: Add net_dev and private data checks to
>> RDMA CM")
>> Signed-off-by: Haggai Eran 
>> ---
>> Changes from v0:
>> - improve commit message
>>
>>  drivers/infiniband/core/cma.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
>> index 59a2dafc8c57..e8324543e085 100644
>> --- a/drivers/infiniband/core/cma.c
>> +++ b/drivers/infiniband/core/cma.c
>> @@ -1067,14 +1067,14 @@ static int cma_save_req_info(const struct 
>> ib_cm_event *ib_event,
>> sizeof(req->local_gid));
>>  req->has_gid= true;
>>  req->service_id = req_param->primary_path->service_id;
>> -req->pkey   = req_param->bth_pkey;
>> +req->pkey   = be16_to_cpu(req_param->primary_path->pkey);
>>  break;
>>  case IB_CM_SIDR_REQ_RECEIVED:
>>  req->device = sidr_param->listen_id->device;
>>  req->port   = sidr_param->port;
>>  req->has_gid= false;
>>  req->service_id = sidr_param->service_id;
>> -req->pkey   = sidr_param->bth_pkey;
>> +req->pkey   = sidr_param->pkey;
>>  break;
>>  default:
>>  return -EINVAL;
>>
> 
> And, to be clear, you are looking for this to be part of 4.3-rc updates,
> yes?

Yes, the issue was introduced in 4.3 in my cma demux patch.

Haggai

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 00/26] New fast registration API

2015-10-20 Thread Sagi Grimberg

Doug, are you planning on taking this for 4.4?

I think this set has converged towards inclusion.

Reminder, this series goes on top of Christoph's
wr_cleanup patches and iser bounce buffering cleanup.

Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: merge struct ib_device_attr into struct ib_device V2

2015-10-20 Thread Or Gerlitz
On Tue, Oct 20, 2015 at 4:08 PM, Sagi Grimberg  wrote:
>>> I think this is very useful to have and I'd love having it in 4.4,
>>> does anyone have any other comments on this patch?

>> Were we ever presented with performance gains achieved using the patch?

> Can you explain what you mean by performance gains? My understanding is
> that this patch is not performance critical. It just replaces each and
> every ULP device attributes caching.

oops, sorry, I was referring to the patch that deals with
re-structuring of struct ib_wc, so...

(1) did we even got performance gains achieved using **that** patch?

(2) re this one, as I wrote in the past, I am in favor of simple
caching of struct
ib_device_attr on struct ib_device (best with pointer) and not adding
333 fields
to struct ib_device, I don't see the benefit from this patch.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: merge struct ib_device_attr into struct ib_device V2

2015-10-20 Thread Sagi Grimberg

So this patch seems to work well for me. I've ran some kernel
applications (ipoib, iser, srp, nfs) and a user-space application
(ibsrpdm) over mlx4/mlx5 and didn't spot any issues.

I think this is very useful to have and I'd love having it in 4.4,
does anyone have any other comments on this patch?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/23] Update driver to 0.9-294

2015-10-20 Thread Moni Shoua
>
> Perhaps I did not chose my words carefully enough.
>
> The largest issue on the TODO list is the refactoring of the code to be 
> shared between the hfi1 and qib driver.  While the hardware between hfi1 and 
> qib is similar and thus the initial code looked similar, our performance 
> tuning on hfi1 has revealed that some changes will be required to the hfi1 
> code to fully utilize the hardware.  We would like to get these updates in to 
> make sure that the refactoring work is done properly for the 2 hardware types.
Please keep in mind that there are 3 HW types (our SoftRoCE driver)
that needs to be part of the framework.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: merge struct ib_device_attr into struct ib_device V2

2015-10-20 Thread Or Gerlitz
On Tue, Oct 20, 2015 at 3:00 PM, Sagi Grimberg  wrote:
> So this patch seems to work well for me. I've ran some kernel
> applications (ipoib, iser, srp, nfs) and a user-space application
> (ibsrpdm) over mlx4/mlx5 and didn't spot any issues.
>
> I think this is very useful to have and I'd love having it in 4.4,
> does anyone have any other comments on this patch?

Were we ever presented with performance gains achieved using the patch?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/cma: Use inner P_Key to determine netdev

2015-10-20 Thread Haggai Eran
On 19/10/2015 21:19, Jason Gunthorpe wrote:
> On Mon, Oct 19, 2015 at 09:09:25PM +0300, Haggai Eran wrote:
>> When discussing the patches to demux ids in rdma_cm instead of ib_cm, it
>> was decided that it is best to use the P_Key value in the packet headers
>> [1]. However, some drivers are currently unable to send correct P_Key in
>> GMP headers.
> 
> You should explicitly describe the broken drivers in the commit text.
These are mlx5 and ipath. I'll add them to the commit message.

> I thought mlx5 was fixed for receive already? I'm confused why we need
> this.
mlx5 had two issues related to GSI pkeys. The issue that was fixed was
that it treated the pkey value returned by the hardware in receive
completions as a pkey_index. The remaining issue is that it doesn't
respect the ib_send_wr.ud.pkey_index field when sending.

With the current state of things, cma will try to look for an ipoib net
dev matching the BTH pkey of the request, but if the sender is mlx5 or
ipath, the BTH pkey would be the default pkey. If the request was
intended for a different pkey, cma won't find a matching netdev and will
throw away the request.

Haggai
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH rdma-cm] IB/core: Fix use after free of ifa

2015-10-20 Thread Matan Barak



On 10/19/2015 6:27 PM, Doug Ledford wrote:

On 10/19/2015 10:20 AM, Matan Barak wrote:

On 10/19/2015 3:23 PM, Doug Ledford wrote:



This is a major cause of the slowness.  Unless you have a specific need
of them, per-entry rwlocks are *NOT* a good idea.  I was going to bring
this up separately, so I'll just mention it here.  Per entry locks help
reduce contention when you have lots of multiple accessors.  Using
rwlocks help reduce contention when you have a read-mostly entry that is
only occasionally changed.  But every lock and every unlock (just like
every atomic access) still requires a locked memory cycle.  That means
every lock acquire and every lock release requires a synchronization
event between all CPUs.  Using per-entry locks means that every entry
triggers two of those synchronization events.  On modern Intel CPUs,
they cost about 32 cycles per event.  If you are going to do something,
and it can't be done without a lock, then grab a single lock and do it
as fast as you can.  Only in rare cases would you want per-entry locks.



I agree that every rwlock costs us locked access. However, lets look at
the common scenario here.


No, let's not.  Especially if your "common scenario" causes you to
ignore pathological bad behavior.  Optimizing for the common case is not
an excuse to enable pathological behavior.


I think that in production stable systems, the
IPs our rdma-cm stack uses (and thus GIDs) should be pretty stable.


Probably true.


Thus, most accesses should be read calls.


Probably true.


That's why IMHO read-write
access makes sense here.


Probably true.


Regarding single-lock vs per entry lock, it really depends on common an
entry could be updated while another entry is being used by an
application.


No, it depends on more than that.  In addition to the frequency of
updates versus lookups, there is also the issue of lookup cost and
update cost.  Using per entry locks makes both the lookup cost and the
update cost of anything other than the first gid in the table grow
exponentially.  A good way to demonstrate this would be to create 100
vlans on a RoCE device, then run the cmtime test between two hosts using
the 100th vlan IP at each end.  Using per-entry locks, the performance
of this test relative to using the first IP on the device will be
pathologically bad.



That's correct, we"waste" #entries * . But we 
have to note this is mainly during the connecting part.



In a (future) dynamic system, you might want to create
containers dynamically, which will add a net device and change the
hardware GID table while other application (maybe in another container)
uses other GIDs, but this might be rare scenario.


This is a non-issue.  Creating a container and an application inside
that container to use an RDMA interface is already a heavy weight
operation (minimum fork/exec/program startup).  Table lookups can be
needed thousands of times per second under certain conditions and must
be fast.  Per-entry locks are only fast for the first item in the list.
  After that, they progressively get slower and slower than usage with a
single lock.



We first used RCU and seqcount in order to protect each entry. When we 
changed that to rwlock, we added some per-entry work.



Ok, refactoring this code for a single lock shouldn't be problematic.
Regarding performance, I think the results here are vastly impacted by
the rate we'll add/remove IPs or upper net-devices in the background.


Not really.  You will never add/remove IPs fast enough to justify
per-entry locks, especially when you consider that ib_cache_gid_add
calls find_gid twice, once going over the entire table if the gid isn't
found, before ever calling add_gid.  The ocrdma table isn't bad because
it's only 16 entries.  But the mlx4 table is going to be 128 I suspect,
so you can do the test I mentioned above and use 100 vlans.  In that
scenario, the 101st gid will require that you take and release 128 locks
to scan for the entry in the list, it will come back empty, then you
will take and release 101 locks until you find an empty entry, and then
you will take a write lock as you write the gid.  That's 230 total
locks, for 460 total locked memory operations at 32 cycles each, so a
total of 14,720 cycles spent doing nothing but locking the memory bus.
And each of those operations slows down all of the CPUs in the system.

Where I've seen this sort of behavior totally wreck a production system
is when the GID you need to look up is on port 2.  Even if it is the
firstmost GID on port 2, if you have to search all of port 1's table
first, then by the time you get to port 2's table, you've already lost.

So, here's some suggestions:

1)  Use a single lock on the table per port.


Make sense as an enhancement.


2)  Change find_gid so that it will take an additional element, int
*empty_idx, and as it is scanning the gid table, when it runs across an
empty entry, if empty_idx != NULL, *empty_idx = idx;.  This prevents
needing to scan 

Re: [PATCH TRIVIAL] ib_pack.h: Fix commentary IBA reference for CNP in IB opcode enum

2015-10-20 Thread Doug Ledford
On 09/30/2015 03:04 PM, Hal Rosenstock wrote:
> 
> IBA spec is now 1.3 not 3.1 and vol 1 should be mentioned
> as there is also vol 2.
> 
> Signed-off-by: Hal Rosenstock 
> ---
> diff --git a/include/rdma/ib_pack.h b/include/rdma/ib_pack.h
> index 709a533..e99d8f9 100644
> --- a/include/rdma/ib_pack.h
> +++ b/include/rdma/ib_pack.h
> @@ -76,7 +76,7 @@ enum {
>   IB_OPCODE_UC= 0x20,
>   IB_OPCODE_RD= 0x40,
>   IB_OPCODE_UD= 0x60,
> - /* per IBTA 3.1 Table 38, A10.3.2 */
> + /* per IBTA 1.3 vol 1 Table 38, A10.3.2 */

Thanks Hal, I've added this to the 4.4 list.


-- 
Doug Ledford 
  GPG KeyID: 0E572FDD




signature.asc
Description: OpenPGP digital signature


Re: [PATCH] IB/iser: fix a comment typo

2015-10-20 Thread Doug Ledford
On 10/04/2015 05:00 AM, Geliang Tang wrote:
> Just fix a typo in the code comment.
> 
> Signed-off-by: Geliang Tang 

This has been picked up for the 4.4 merge window.


-- 
Doug Ledford 
  GPG KeyID: 0E572FDD




signature.asc
Description: OpenPGP digital signature


Re: [PATCH] IB/ucma: check workqueue allocation before usage

2015-10-20 Thread Doug Ledford
On 10/01/2015 10:54 AM, Sasha Levin wrote:
> On 09/17/2015 06:59 PM, Jason Gunthorpe wrote:
>> On Thu, Sep 17, 2015 at 04:04:19PM -0400, Sasha Levin wrote:
>>> Allocating a workqueue might fail, which wasn't checked so far and would
>>> lead to NULL ptr derefs when an attempt to use it was made.
>>
>> Indeed.
>>
>> Yishai, please check this and check the other patches you've sent to
>> see if they have a similar error.
> 
> Ping? It doesn't seem that the patch got anywhere.

Sorry, I took the "it applies to linux-next" to mean it wasn't against
my tree yet and was against upcoming patches.  That wasn't the case.
It's been applied.


-- 
Doug Ledford 
  GPG KeyID: 0E572FDD




signature.asc
Description: OpenPGP digital signature


Re: 4.4's rdma plate

2015-10-20 Thread Or Gerlitz
On Wed, Oct 14, 2015 at 3:56 PM, Doug Ledford  wrote:
> On 10/14/2015 12:51 AM, Or Gerlitz wrote:
>> On Sun, Oct 11, 2015 at 11:58 PM, Or Gerlitz  wrote:
>>> Hi Doug,
>>> As we're soon on 4.3-rc5, could you please update how things look for 4.4?
>>>
>>> The mutlicast loopback prevention patches from Eran were intially
>>> posted ~two months ago and went through testing by Christoph and Co.
>> [...]


>> Hi Doug,

>> So we're after rc5 and not a single word from you re the plans for the
>> next kernel, and your branch for that purpose contains exactly two 
>> patches... I do
>> see few chunks
>> of patches marked as under review in your patchworks, so there might
>> be sign for getting some hope, is that real?

> Yes Or, I'm working on it.  A new 4.3-rc pull request should come out
> today and an update on the for next area probably as well.

Doug,

So we're after rc6 and still pretty much nothing in your for-4.4
branch, are we getting there
(e.g picking something from the backlog such as the self multicast
loopback prevention and the
rdma-cm name space patches) or for 4.4 we'd be coming naked?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 4.4's rdma plate

2015-10-20 Thread Or Gerlitz
On Tue, Oct 20, 2015 at 7:31 PM, Doug Ledford  wrote:
> On 10/20/2015 12:25 PM, Or Gerlitz wrote:
>> On Wed, Oct 14, 2015 at 3:56 PM, Doug Ledford  wrote:
>>> On 10/14/2015 12:51 AM, Or Gerlitz wrote:
 On Sun, Oct 11, 2015 at 11:58 PM, Or Gerlitz  wrote:
> Hi Doug,
> As we're soon on 4.3-rc5, could you please update how things look for 4.4?
>
> The mutlicast loopback prevention patches from Eran were intially
> posted ~two months ago and went through testing by Christoph and Co.
 [...]
>>
>>
 Hi Doug,
>>
 So we're after rc5 and not a single word from you re the plans for the
 next kernel, and your branch for that purpose contains exactly two 
 patches... I do
 see few chunks
 of patches marked as under review in your patchworks, so there might
 be sign for getting some hope, is that real?
>>
>>> Yes Or, I'm working on it.  A new 4.3-rc pull request should come out
>>> today and an update on the for next area probably as well.

>> Doug,
>> So we're after rc6 and still pretty much nothing in your for-4.4
>> branch, are we getting there
>> (e.g picking something from the backlog such as the self multicast
>> loopback prevention and the
>> rdma-cm name space patches) or for 4.4 we'd be coming naked?

> Another pull request for the final 4.3 fixups will go in today.  I
> prioritized those over the for-next work.
> Next I'll be pulling simple for-next items, then I'll be reviewing the

Sounds good as a starter

> then I'll be reviewing the
> RoCE GID cadche usage in verbs/cma v2 patchset, then the Add RoCE v2
> support v1 patchset, then the New fast registration API v5 set.

Doug, the New fast registration API v5 set has E2E community consensus,
I guess there's not much left for you to dig there, so these are good news.

As for the RoCE patches, you mentioned reviewing them, this is going to
take time, which makes sense. However, the self-multicast loopback prevention
patches are (1) sitting in the mailing list for months (2) small in
volume and very large
in benefit (3) passed testing by Christoph and Co, any reason not to
pick them prior
to review of the RoCE code.

We need to encourage developers to take part in the upstream process,
For this series
I am clueless towards the developers where they ask me what's missing?
why this isn't
moving for months?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH rdma-RC] IB/cm: Fix sleeping while atomic when creating AH from WC

2015-10-20 Thread Hal Rosenstock


On 10/20/2015 11:57 AM, Hefty, Sean wrote:
>> Today, cm assumes paths are reversible primary_path->reversible = 1.
> 
> I can't quickly find a link, but I believe all CM MADs are reversible, per 
> the spec.

Spec citation for this is:

C12-5.1.3: All responses generated by the CM protocol shall follow the
rules for response generation that are enumerated in 13.5.4 Response
Generation and Reversible Paths.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/23] Update driver to 0.9-294

2015-10-20 Thread ira.weiny
On Tue, Oct 20, 2015 at 03:45:47PM +0300, Moni Shoua wrote:
> >
> > Perhaps I did not chose my words carefully enough.
> >
> > The largest issue on the TODO list is the refactoring of the code to be
> > shared between the hfi1 and qib driver.  While the hardware between hfi1
> > and qib is similar and thus the initial code looked similar, our
> > performance tuning on hfi1 has revealed that some changes will be required
> > to the hfi1 code to fully utilize the hardware.  We would like to get these
> > updates in to make sure that the refactoring work is done properly for the
> > 2 hardware types.
>
> Please keep in mind that there are 3 HW types (our SoftRoCE driver)
> that needs to be part of the framework.

Understood, however, unlike SoftRoCE, qib and hfi currently share a lot of code
to drive the hardware.

The underlying reason for the TODO item "Remove software processing of IB
protocol..." is because we have a large amount of duplicated code between these
drivers.  _Some_ of which, at a high level, is sharable with SoftRoCE.

These patches (and more to follow), further differentiate the 2 drivers along
hardware lines.  Accepting these patches will help us make sure that we don't
create some common code between qib and hfi which, because of our testing we
now know, needs to be separated out.

This is a separate issue from the higher level code sharing which will be done
with SoftRoCE.

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/cma: Potential NULL dereference in cma_id_from_event

2015-10-20 Thread Doug Ledford
On 09/21/2015 09:02 AM, Haggai Eran wrote:
> If the lookup of a listening ID failed for an AF_IB request, the code
> would try to call dev_put() on a NULL net_dev.
> 
> Fixes: be688195bd08 ("IB/cma: Fix net_dev reference leak with failed
> requests")
> Reported-by: Dan Carpenter 
> Signed-off-by: Haggai Eran 
> ---
>  drivers/infiniband/core/cma.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> index b1ab13f3e182..b92a3c2c060b 100644
> --- a/drivers/infiniband/core/cma.c
> +++ b/drivers/infiniband/core/cma.c
> @@ -1302,7 +1302,7 @@ static struct rdma_id_private *cma_id_from_event(struct 
> ib_cm_id *cm_id,
>   bind_list = cma_ps_find(rdma_ps_from_service_id(req.service_id),
>   cma_port_from_service_id(req.service_id));
>   id_priv = cma_find_listener(bind_list, cm_id, ib_event, , *net_dev);
> - if (IS_ERR(id_priv)) {
> + if (IS_ERR(id_priv) && *net_dev) {
>   dev_put(*net_dev);
>   *net_dev = NULL;
>   }
> 

This one got lost back in the noise of all the for-next patches.  Sorry
to have missed it.  I've picked it up now for -rc.

-- 
Doug Ledford 
  GPG KeyID: 0E572FDD




signature.asc
Description: OpenPGP digital signature


Re: merge struct ib_device_attr into struct ib_device V2

2015-10-20 Thread Sagi Grimberg



But this makes struct ib_device much much bigger, and this structure
is used in **fast** path,
e.g the ULP traverses through a pointer from struct ib_device to
post_send/recv, poll_cq and friends.


Umm, all the caps are appended to the end of the ib_device structure so
I don't see how it will affect the fast path - am I missing something?



The networking community will let you work for 10y before they add a
field to struct net_device exactly b/c
of the reason I brought. Why here we can come out of the blue and add
tens if not hundreds of fields to our device structure?


Keeping struct ib_device_attr embedded at the end of struct ib_device is
exactly the same isn't it?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: merge struct ib_device_attr into struct ib_device V2

2015-10-20 Thread Or Gerlitz
On Tue, Oct 20, 2015 at 6:00 PM, Sagi Grimberg  wrote:


>> (2) re this one, as I wrote in the past, I am in favor of simple
>> caching of struct ib_device_attr on struct ib_device (best with pointer) and 
>> not adding
>> 333 fields to struct ib_device, I don't see the benefit from this patch.

> Christoph commented on that:
>
> "I'm strongly against this.  As the reviews show the move is highly
> confusing.  The attributes don't change and there is no need to 'cache'
> or 'query' them.  Just merge them into the device, follow years of
> experience with how every other Linux subsystem does it and be done
> with it."
>
> If the attributes are automatically a part of the ib_device then it
> does seem a bit redundant keeping the structure around...

But this makes struct ib_device much much bigger, and this structure
is used in **fast** path,
e.g the ULP traverses through a pointer from struct ib_device to
post_send/recv, poll_cq and friends.

The networking community will let you work for 10y before they add a
field to struct net_device exactly b/c
of the reason I brought. Why here we can come out of the blue and add
tens if not hundreds of fields to our device structure?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: merge struct ib_device_attr into struct ib_device V2

2015-10-20 Thread Or Gerlitz
On Tue, Oct 20, 2015 at 6:13 PM, Sagi Grimberg  wrote:

>> The networking community will let you work for 10y before they add a
>> field to struct net_device exactly b/c
>> of the reason I brought. Why here we can come out of the blue and add
>> tens if not hundreds of fields to our device structure?

> Keeping struct ib_device_attr embedded at the end of struct ib_device is
> exactly the same isn't it?

I said " (best with pointer)"
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH rdma-RC] IB/cm: Fix sleeping while atomic when creating AH from WC

2015-10-20 Thread Hefty, Sean
> Today, cm assumes paths are reversible primary_path->reversible = 1.

I can't quickly find a link, but I believe all CM MADs are reversible, per the 
spec.


Re: 4.4's rdma plate

2015-10-20 Thread Jason Gunthorpe
On Tue, Oct 20, 2015 at 07:42:25PM +0300, Or Gerlitz wrote:

> As for the RoCE patches, you mentioned reviewing them, this is going to
> take time, which makes sense. However, the self-multicast loopback prevention
> patches are

My issue with the latest round of UAPI changes is I don't see any
Reviewed-By tag from someone at Mellanox whom I think understands our
UAPI.

UAPI changes are *NOT* something first-time list contributors should
be doing unassisted.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH rdma-cm] IB/core: Fix use after free of ifa

2015-10-20 Thread Doug Ledford
On 10/20/2015 10:50 AM, Matan Barak wrote:
> 
> 
> On 10/19/2015 6:27 PM, Doug Ledford wrote:
>> On 10/19/2015 10:20 AM, Matan Barak wrote:
>>> On 10/19/2015 3:23 PM, Doug Ledford wrote:
>>
 This is a major cause of the slowness.  Unless you have a specific need
 of them, per-entry rwlocks are *NOT* a good idea.  I was going to bring
 this up separately, so I'll just mention it here.  Per entry locks help
 reduce contention when you have lots of multiple accessors.  Using
 rwlocks help reduce contention when you have a read-mostly entry
 that is
 only occasionally changed.  But every lock and every unlock (just like
 every atomic access) still requires a locked memory cycle.  That means
 every lock acquire and every lock release requires a synchronization
 event between all CPUs.  Using per-entry locks means that every entry
 triggers two of those synchronization events.  On modern Intel CPUs,
 they cost about 32 cycles per event.  If you are going to do something,
 and it can't be done without a lock, then grab a single lock and do it
 as fast as you can.  Only in rare cases would you want per-entry locks.

>>>
>>> I agree that every rwlock costs us locked access. However, lets look at
>>> the common scenario here.
>>
>> No, let's not.  Especially if your "common scenario" causes you to
>> ignore pathological bad behavior.  Optimizing for the common case is not
>> an excuse to enable pathological behavior.
>>
>>> I think that in production stable systems, the
>>> IPs our rdma-cm stack uses (and thus GIDs) should be pretty stable.
>>
>> Probably true.
>>
>>> Thus, most accesses should be read calls.
>>
>> Probably true.
>>
>>> That's why IMHO read-write
>>> access makes sense here.
>>
>> Probably true.
>>
>>> Regarding single-lock vs per entry lock, it really depends on common an
>>> entry could be updated while another entry is being used by an
>>> application.
>>
>> No, it depends on more than that.  In addition to the frequency of
>> updates versus lookups, there is also the issue of lookup cost and
>> update cost.  Using per entry locks makes both the lookup cost and the
>> update cost of anything other than the first gid in the table grow
>> exponentially.  A good way to demonstrate this would be to create 100
>> vlans on a RoCE device, then run the cmtime test between two hosts using
>> the 100th vlan IP at each end.  Using per-entry locks, the performance
>> of this test relative to using the first IP on the device will be
>> pathologically bad.
>>
> 
> That's correct, we"waste" #entries * . But we
> have to note this is mainly during the connecting part.

Yes, and I spent a *month* of my life tracking down why a customer's
real world application was failing in production (but not in their
staging environment) and it all boiled down to the fact that when their
app had to start up more than about 700-900 connections at app startup
and their setup included dual port controllers with both ports in use,
and some of those connections had to look up GIDs on the second port,
the abysmal per-gid lookup rate meant that the kernel had a hard ceiling
of how many GIDs it could look up per second.  Once that was hit, the
app went into a permanent cycle of dropping failed connections and
re-opening new ones and it would simply never catch up.

So it concerns me greatly to hear you say "But we have to note this is
mainly during the connecting part", implying that some wasted cycles on
connection startup are OK.  They aren't.  Not ever.  Production systems
fall over dead precisely because of things like this.

>> 3)  Consider optimizing the code by requiring that any time a GID is
>> added, it must take the first empty spot, and any time one is deleted,
>> any valid entries after it must be moved up to fill the empty spot, then
>> optimize find_gid to quit once it finds an empty slot because it knows
>> the rest of the table is empty.  Given that mlx4 keeps a huge 128 table
>> array, this can be very helpful.
> 
> I don't think we want to change existing GID entries. Entities sometimes
> want to refer to GIDs via GID indices. Changing the GIDs order under
> their feet could be problematic.

I had a feeling this one might be an issue.  There is, however, a
solution that would resolve the problem.  The issue here is that you
have an array of GIDs that you want to be able to both scan efficiently
and access directly.  The array is great for direct access, but the scan
efficiently part is a total looser because you scan so many empty
entries most of the time.  You could change it so that a gid port table
is actually a combination of 1) an array of gid_port_table_element
pointers and 2) a linked list of gid_port_table_element structs.  Then
you only allocate a struct when needed, link it into the list, assign it
an index, set the pointer in the array at point index to point to your
struct, and that way you can still direct access the elements, but can
also 

Re: 4.4's rdma plate

2015-10-20 Thread Doug Ledford
On 10/20/2015 12:25 PM, Or Gerlitz wrote:
> On Wed, Oct 14, 2015 at 3:56 PM, Doug Ledford  wrote:
>> On 10/14/2015 12:51 AM, Or Gerlitz wrote:
>>> On Sun, Oct 11, 2015 at 11:58 PM, Or Gerlitz  wrote:
 Hi Doug,
 As we're soon on 4.3-rc5, could you please update how things look for 4.4?

 The mutlicast loopback prevention patches from Eran were intially
 posted ~two months ago and went through testing by Christoph and Co.
>>> [...]
> 
> 
>>> Hi Doug,
> 
>>> So we're after rc5 and not a single word from you re the plans for the
>>> next kernel, and your branch for that purpose contains exactly two 
>>> patches... I do
>>> see few chunks
>>> of patches marked as under review in your patchworks, so there might
>>> be sign for getting some hope, is that real?
> 
>> Yes Or, I'm working on it.  A new 4.3-rc pull request should come out
>> today and an update on the for next area probably as well.
> 
> Doug,
> 
> So we're after rc6 and still pretty much nothing in your for-4.4
> branch, are we getting there
> (e.g picking something from the backlog such as the self multicast
> loopback prevention and the
> rdma-cm name space patches) or for 4.4 we'd be coming naked?
> 
> Or.
> 

Another pull request for the final 4.3 fixups will go in today.  I
prioritized those over the for-next work.

Next I'll be pulling simple for-next items, then I'll be reviewing the
RoCE GID cadche usage in verbs/cma v2 patchset, then the Add RoCE v2
support v1 patchset, then the New fast registration API v5 set.

-- 
Doug Ledford 
  GPG KeyID: 0E572FDD




signature.asc
Description: OpenPGP digital signature


Re: [PATCH] IB/cma: Use inner P_Key to determine netdev

2015-10-20 Thread Jason Gunthorpe
On Tue, Oct 20, 2015 at 09:45:27AM +0300, Haggai Eran wrote:
> On 19/10/2015 21:19, Jason Gunthorpe wrote:
> > On Mon, Oct 19, 2015 at 09:09:25PM +0300, Haggai Eran wrote:
> >> When discussing the patches to demux ids in rdma_cm instead of ib_cm, it
> >> was decided that it is best to use the P_Key value in the packet headers
> >> [1]. However, some drivers are currently unable to send correct P_Key in
> >> GMP headers.
> > 
> > You should explicitly describe the broken drivers in the commit text.
> These are mlx5 and ipath. I'll add them to the commit message.

And ipath is actually ipath, the obsolete driver being removed, not
qib? (ie we don't care about it?)

> The remaining issue is that it doesn't respect the
> ib_send_wr.ud.pkey_index field when sending.

But this is a very serious bug, to the point the mis-labeled packets
may not even be delivered in some cases - you really care about the
sub case where mis-labeled packets happen to be deliverable but don't
parse right?

Well, don't forget to undo this patch when the mlx5 driver is fixed..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH rdma-RC] IB/cm: Fix sleeping while atomic when creating AH from WC

2015-10-20 Thread Jason Gunthorpe
On Tue, Oct 20, 2015 at 03:57:54PM +, Hefty, Sean wrote:
> > Today, cm assumes paths are reversible primary_path->reversible = 1.
> 
> I can't quickly find a link, but I believe all CM MADs are
> reversible, per the spec.

But the Linux CM code doesn't always create the reverse CM MAD from
the GMP headers, it sometimes will do it by looking into the data path
in the MAD, which means it still could need to sleep to do the MAC
resolution.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 4.4's rdma plate

2015-10-20 Thread Or Gerlitz
On Tue, Oct 20, 2015 at 7:52 PM, Jason Gunthorpe wrote:
> On Tue, Oct 20, 2015 at 07:42:25PM +0300, Or Gerlitz wrote:

>> As for the RoCE patches, you mentioned reviewing them, this is going to
>> take time, which makes sense. However, the self-multicast loopback prevention
>> patches are
>
> My issue with the latest round of UAPI changes is I don't see any
> Reviewed-By tag from someone at Mellanox whom I think understands our
> UAPI.
>
> UAPI changes are *NOT* something first-time list contributors should
> be doing unassisted.

Jason, your point makes sense, when we did the internal review, I have
asked that someone with the relevant knowledge will review this part.

As I wrote below, we should encourage people to take part in upstream
development, and making such comment only after the patch sits on the
list whole two months (V1 was very minor change, V2 was re-spin) isn't
going in the encouragement direction.

There's one patch in the series that adds UAPI -- it's 1/7 "IB/core:
Extend ib_uverbs_create_qp"
https://patchwork.kernel.org/patch/7405241/

I will try to get Reviewed-by: from Haggai or Matan tomorrow, can you
also take a look today, so we don't miss 4.4 just for that?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH rdma-RC] IB/cm: Fix rb-tree duplicate free and use-after-free

2015-10-20 Thread Doug Ledford
On 10/15/2015 11:15 AM, Matan Barak wrote:
> 
> 
> On 10/12/2015 7:37 PM, Hefty, Sean wrote:
>>> ib_send_cm_sidr_rep could sometimes erase the node from the sidr
>>> (depending on errors in the process). Since ib_send_cm_sidr_rep is
>>> called both from cm_sidr_req_handler and cm_destroy_id, cm_id_priv
>>
>> This should clarify that it is the app calling from the callback, and
>> not a direct call from the cm_sidr_req_handler.
>>
> 
> Consider the following error flows:
> 
> Double free:
> cm_sidr_req_handler:3156->cm_reject_sidr_req:663->ib_send_cm_sidr_rep:3233->erase_rb
> 
> cm_sidr_req_handler:3173->ib_destroy_cm_id->cm_destroy_id:846->ib_send_cm_sidr_rep:3233->erase_rb
> 
> 
> RB contains free node:
> cm_sidr_req_handler:3156->cm_reject_sidr_req:663->ib_send_cm_sidr_rep->returns
> error(for example cm_alloc_msg,3219)
> 
> cm_sidr_req_handler:3173->ib_destroy_cm_id->cm_destroy_id:846->cm_reject_sidr_req->cm_reject_sidr_req:663->returns
> 
> error(for example cm_alloc_msg,3219)->RB wasn't erased but memory is
> freed :910 kfree(cm_id_priv)
> 
> 
>>> could be either erased from the rb_tree twice or not erased at all.
>>
>> In an error case, I can see why it would be left in the rbtree, but I
>> don't see how it can be removed twice.
>>
>>
>>> Fixing that by making sure it's erased only once before freeing
>>> cm_id_priv.
>>>
>>> Fixes: a977049dacde ('[PATCH] IB: Add the kernel CM implementation')
>>> Signed-off-by: Doron Tsur 
>>> Signed-off-by: Matan Barak 
>>> ---
>>>
>>> Hi Doug,
>>> This patch fixes a bug in the CM. In some flow, rb-tree could be
>>> freed twice or used after it was freed. This bug was picked by
>>> our regression tests and this fix was verified.
>>>
>>> Thanks,
>>> Doron and Matan
>>>
>>>   drivers/infiniband/core/cm.c | 10 +-
>>>   1 file changed, 9 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
>>> index f5cf1c4..56ff0f3 100644
>>> --- a/drivers/infiniband/core/cm.c
>>> +++ b/drivers/infiniband/core/cm.c
>>> @@ -844,6 +844,11 @@ retest:
>>>   case IB_CM_SIDR_REQ_RCVD:
>>>   spin_unlock_irq(_id_priv->lock);
>>>   cm_reject_sidr_req(cm_id_priv, IB_SIDR_REJECT);
>>> +spin_lock_irq();
>>> +if (!RB_EMPTY_NODE(_id_priv->sidr_id_node))
>>> +rb_erase(_id_priv->sidr_id_node,
>>> + _sidr_table);
>>> +spin_unlock_irq();
> 
> This change seeks to remove the about to be freed node from the rb tree,
> while verifying it has not been freed already
> 
>>
>> We should be able to use a return value from cm_reject_sidr_req() --
>> passed through from ib_send_cm_sidr_rep() to determine if the id was
>> removed from the tree.
>>
> 
> But this won't protect from double free in ib_send_cm_sidr_rep, unless
> we pass this parameter to the cm destroy function, but this alternative
> is cumbersome.
> 
>>>   break;
>>>   case IB_CM_REQ_SENT:
>>>   case IB_CM_MRA_REQ_RCVD:
>>> @@ -3210,7 +3215,10 @@ int ib_send_cm_sidr_rep(struct ib_cm_id *cm_id,
>>>   spin_unlock_irqrestore(_id_priv->lock, flags);
>>>
>>>   spin_lock_irqsave(, flags);
>>> -rb_erase(_id_priv->sidr_id_node, _sidr_table);
>>> +if (!RB_EMPTY_NODE(_id_priv->sidr_id_node)) {
>>> +rb_erase(_id_priv->sidr_id_node, _sidr_table);
>>> +RB_CLEAR_NODE(_id_priv->sidr_id_node);
>>> +}
> 
> This change protects against double free
> 
>>>   spin_unlock_irqrestore(, flags);
>>
>> Something is very wrong in this function if the id is not in the tree
>> at this point.
>>
> 
> We agree, but there's an error flow that triggers this behavior.

Sean, I need to close on this patch.  What is your position after
Matan's explanation?

-- 
Doug Ledford 
  GPG KeyID: 0E572FDD




signature.asc
Description: OpenPGP digital signature


Re: [PATCH v1] IB/cma: Use inner P_Key to determine netdev

2015-10-20 Thread Doug Ledford
On 10/20/2015 04:35 AM, Haggai Eran wrote:
> On 20/10/2015 10:20, Doug Ledford wrote:
>> On 10/20/2015 02:53 AM, Haggai Eran wrote:
>>> When discussing the patches to demux ids in rdma_cm instead of ib_cm, it
>>> was decided that it is best to use the P_Key value in the packet headers.
>>> However, the mlx5 and ipath drivers are currently unable to send correct
>>> P_Key values in GMP headers. They always send using a single P_Key that is
>>> set during the GSI QP initialization.
>>>
>>> Change the rdma_cm code to look at the P_Key value that is part of the
>>> packet payload as a workaround. Once the drivers are fixed this patch can
>>> be reverted.
>>>
>>> Fixes: 4c21b5bcef73 ("IB/cma: Add net_dev and private data checks to
>>> RDMA CM")
>>> Signed-off-by: Haggai Eran 
>>> ---
>>> Changes from v0:
>>> - improve commit message
>>>
>>>  drivers/infiniband/core/cma.c | 4 ++--
>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
>>> index 59a2dafc8c57..e8324543e085 100644
>>> --- a/drivers/infiniband/core/cma.c
>>> +++ b/drivers/infiniband/core/cma.c
>>> @@ -1067,14 +1067,14 @@ static int cma_save_req_info(const struct 
>>> ib_cm_event *ib_event,
>>>sizeof(req->local_gid));
>>> req->has_gid= true;
>>> req->service_id = req_param->primary_path->service_id;
>>> -   req->pkey   = req_param->bth_pkey;
>>> +   req->pkey   = be16_to_cpu(req_param->primary_path->pkey);
>>> break;
>>> case IB_CM_SIDR_REQ_RECEIVED:
>>> req->device = sidr_param->listen_id->device;
>>> req->port   = sidr_param->port;
>>> req->has_gid= false;
>>> req->service_id = sidr_param->service_id;
>>> -   req->pkey   = sidr_param->bth_pkey;
>>> +   req->pkey   = sidr_param->pkey;
>>> break;
>>> default:
>>> return -EINVAL;
>>>
>>
>> And, to be clear, you are looking for this to be part of 4.3-rc updates,
>> yes?
> 
> Yes, the issue was introduced in 4.3 in my cma demux patch.

Applied, thanks.


-- 
Doug Ledford 
  GPG KeyID: 0E572FDD




signature.asc
Description: OpenPGP digital signature


Re: [PATCH rdma-cm] IB/core: Fix use after free of ifa

2015-10-20 Thread Doug Ledford
On 10/15/2015 08:01 AM, Matan Barak wrote:
> When using ifup/ifdown while executing enum_netdev_ipv4_ips,
> ifa could become invalid and cause use after free error.
> Fixing it by protecting with RCU lock.
> 
> Fixes: 03db3a2d81e6 ('IB/core: Add RoCE GID table management')
> Signed-off-by: Matan Barak 

This is in my tree for -rc.  Thanks.

> ---
> 
> Hi Doug,
> 
> This patch fixes a bug in RoCE GID table implementation. Under stress 
> conditions
> where ifup/ifdown are used, the ifa pointer could become invalid. Using a
> RCU lock in order to avoid freeing the ifa node (as done in other inet 
> functions
> (for example, inet_addr_onlink).
> 
> Our QA team verified that this patch fixes this issue.
> 
> Thanks,
> Matan
> 
>  drivers/infiniband/core/roce_gid_mgmt.c | 35 
> +
>  1 file changed, 27 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/infiniband/core/roce_gid_mgmt.c 
> b/drivers/infiniband/core/roce_gid_mgmt.c
> index 6b24cba..178f984 100644
> --- a/drivers/infiniband/core/roce_gid_mgmt.c
> +++ b/drivers/infiniband/core/roce_gid_mgmt.c
> @@ -250,25 +250,44 @@ static void enum_netdev_ipv4_ips(struct ib_device 
> *ib_dev,
>u8 port, struct net_device *ndev)
>  {
>   struct in_device *in_dev;
> + struct sin_list {
> + struct list_headlist;
> + struct sockaddr_in  ip;
> + };
> + struct sin_list *sin_iter;
> + struct sin_list *sin_temp;
>  
> + LIST_HEAD(sin_list);
>   if (ndev->reg_state >= NETREG_UNREGISTERING)
>   return;
>  
> - in_dev = in_dev_get(ndev);
> - if (!in_dev)
> + rcu_read_lock();
> + in_dev = __in_dev_get_rcu(ndev);
> + if (!in_dev) {
> + rcu_read_unlock();
>   return;
> + }
>  
>   for_ifa(in_dev) {
> - struct sockaddr_in ip;
> + struct sin_list *entry = kzalloc(sizeof(*entry), GFP_ATOMIC);
>  
> - ip.sin_family = AF_INET;
> - ip.sin_addr.s_addr = ifa->ifa_address;
> - update_gid_ip(GID_ADD, ib_dev, port, ndev,
> -   (struct sockaddr *));
> + if (!entry) {
> + pr_warn("roce_gid_mgmt: couldn't allocate entry for 
> IPv4 update\n");
> + continue;
> + }
> + entry->ip.sin_family = AF_INET;
> + entry->ip.sin_addr.s_addr = ifa->ifa_address;
> + list_add_tail(>list, _list);
>   }
>   endfor_ifa(in_dev);
> + rcu_read_unlock();
>  
> - in_dev_put(in_dev);
> + list_for_each_entry_safe(sin_iter, sin_temp, _list, list) {
> + update_gid_ip(GID_ADD, ib_dev, port, ndev,
> +   (struct sockaddr *)_iter->ip);
> + list_del(_iter->list);
> + kfree(sin_iter);
> + }
>  }
>  
>  static void enum_netdev_ipv6_ips(struct ib_device *ib_dev,
> 


-- 
Doug Ledford 
  GPG KeyID: 0E572FDD




signature.asc
Description: OpenPGP digital signature


Re: merge struct ib_device_attr into struct ib_device V2

2015-10-20 Thread Sagi Grimberg

On 10/20/2015 5:08 PM, Or Gerlitz wrote:

On Tue, Oct 20, 2015 at 4:08 PM, Sagi Grimberg  wrote:

I think this is very useful to have and I'd love having it in 4.4,
does anyone have any other comments on this patch?



Were we ever presented with performance gains achieved using the patch?



Can you explain what you mean by performance gains? My understanding is
that this patch is not performance critical. It just replaces each and
every ULP device attributes caching.


oops, sorry, I was referring to the patch that deals with
re-structuring of struct ib_wc, so...

(1) did we even got performance gains achieved using **that** patch?


I don't know if we'd see noticeable performance gains from **that**
patch either.. The benefit is mostly stack relief as usually ULPs keep
the work request structure on the stack.



(2) re this one, as I wrote in the past, I am in favor of simple
caching of struct
ib_device_attr on struct ib_device (best with pointer) and not adding
333 fields
to struct ib_device, I don't see the benefit from this patch.



Christoph commented on that:

"I'm strongly against this.  As the reviews show the move is highly
confusing.  The attributes don't change and there is no need to 'cache'
or 'query' them.  Just merge them into the device, follow years of
experience with how every other Linux subsystem does it and be done
with it."

If the attributes are automatically a part of the ib_device then it
does seem a bit redundant keeping the structure around...
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 4.4's rdma plate

2015-10-20 Thread Jason Gunthorpe
On Wed, Oct 21, 2015 at 12:00:54AM +0300, Or Gerlitz wrote:

> As I wrote below, we should encourage people to take part in upstream
> development, and making such comment only after the patch sits on the
> list whole two months (V1 was very minor change, V2 was re-spin) isn't
> going in the encouragement direction.

I would say there are at least 6 people who could meaningfully put a
reviewed-by tag on that UAPI patch, why do you think none of them have
done so in two months? What is needed to get more engagement in
upstream?

> I will try to get Reviewed-by: from Haggai or Matan tomorrow, can you
> also take a look today, so we don't miss 4.4 just for that?

Unlikely, I am very busy with SC|15.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: merge struct ib_device_attr into struct ib_device V2

2015-10-20 Thread Sagi Grimberg



I think this is very useful to have and I'd love having it in 4.4,
does anyone have any other comments on this patch?


Were we ever presented with performance gains achieved using the patch?


Hi Or,

Can you explain what you mean by performance gains? My understanding is
that this patch is not performance critical. It just replaces each and
every ULP device attributes caching.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: merge struct ib_device_attr into struct ib_device V2

2015-10-20 Thread Christoph Hellwig
On Tue, Oct 20, 2015 at 04:08:54PM +0300, Sagi Grimberg wrote:
> Can you explain what you mean by performance gains? My understanding is
> that this patch is not performance critical. It just replaces each and
> every ULP device attributes caching.

Exactly.  The only 'performance' we care about is that of ULD writers :)
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html