Re: RDMAVT pull request

2015-11-09 Thread Dennis Dalessandro

On Mon, Nov 09, 2015 at 01:32:03PM +, Kamal Heib wrote:

  Hi Dennis,
  I forked from your rdmavt project at Github and created pull request [1].
  Could you please check my pull request for the following patches:
  Author: Kamal Heib <[1]kam...@mellanox.com>
  Date:   Sun Nov 8 23:11:09 2015 +0200
  IB/rdmavt: Add address handle to rdmavt

  Create datastructure for address handle and implement
  create/destroy/modify/query of address handle for rdmavt.

  Signed-off-by: Kamal Heib <[2]kam...@mellanox.com>
  Author: Kamal Heib <[3]kam...@mellanox.com>
  Date:   Sun Nov 8 21:36:28 2015 +0200
  IB/rdmavt: Move common LID defines to rdmavt

  Move {QIB, HFI1}_PERMISSIVE_LID to RVT_PERMISSIVE_LID
  Move {QIB, HFI1}_MULTICAST_LID_BASE to RVT_MULTICAST_LID_BASE

  Signed-off-by: Kamal Heib <[4]kam...@mellanox.com>
  [1] - [5]https://github.com/ddalessa/kernel/pull/1


Hi Kamal,

Thanks for the patches! I will take a look and run some quick sanity tests.

-Denny
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH libibcm] cmpost.c: Handle ibv_get_device_list returning no IB devices in init()

2015-11-09 Thread Hal Rosenstock
On 11/5/2015 7:46 PM, Hefty, Sean wrote:
> Merged - thanks.
> 
> This is the first patch against the libibcm in over 4 years.  Is there a 
> reason why this is being used instead of the librdmacm?  I ask because I 
> assumed that the libibcm was basically deprecated.  The last release was over 
> 6 years ago.

I started trying to use this test program to try to reproduce a CM bug.
More to follow after some additional verification.

-- Hal

> - Sean
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH ib-next 2/3] IB/core: IB/core: Allow legacy verbs through extended interfaces

2015-11-09 Thread Eli Cohen
On Sun, Nov 08, 2015 at 05:04:55PM +0200, Matan Barak wrote:
> > +static int verify_command_mask(struct ib_device *ib_dev, __u32 command)
> > +{
> > +   u64 mask;
> > +
> > +   if (command <= IB_USER_VERBS_CMD_OPEN_QP)
> 
> I think using IB_USER_VERBS_CMD_THRESHOLD is clearer, but I don't
> think we need two uverbs_cmd_mask vendor variables.
> IMHO, a vendor shouldn't care if it's an extended/legacy uverb
> command. Maybe we should replace uverbs_cmd_mask with a bitmap.
> 
> > +   mask = ib_dev->uverbs_cmd_mask;
> > +   else
> > +   mask = ib_dev->uverbs_ex_cmd_mask;
> > +
> > +   if (mask & ((u64)1 << command))
> > +   return 0;
> > +
> > +   return -1;
> > +}
> > +

But IB_USER_VERBS_CMD_OPEN_QP is the last legacy verbs and I prefer a
more restrictive approach to avoid potentail issues in the future.

> >  static ssize_t ib_uverbs_write(struct file *filp, const char __user *buf,
> >  size_t count, loff_t *pos)
> >  {
> > @@ -704,6 +719,10 @@ static ssize_t ib_uverbs_write(struct file *filp, 
> > const char __user *buf,
> > }
> >
> > command = hdr.command & IB_USER_VERBS_CMD_COMMAND_MASK;
> > +   if (verify_command_mask(ib_dev, command)) {
> > +   ret = -EINVAL;
> 
> Why did you replace ENOSYS with EINVAL?
> 

Like Haggai mentioned in the other response, checkpatch issues error
on this claiming that ENOSYS is reserved to unavailable system calls.
Is it applicable only for new implementations I am not sure. I don't
have clear preference for either ENOSYS or EINAVL.
> > +   goto out;
> > +   }
> >
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH ib-next 2/3] IB/core: IB/core: Allow legacy verbs through extended interfaces

2015-11-09 Thread Matan Barak
On Mon, Nov 9, 2015 at 5:55 PM, Eli Cohen  wrote:
> On Sun, Nov 08, 2015 at 05:04:55PM +0200, Matan Barak wrote:
>> > +static int verify_command_mask(struct ib_device *ib_dev, __u32 command)
>> > +{
>> > +   u64 mask;
>> > +
>> > +   if (command <= IB_USER_VERBS_CMD_OPEN_QP)
>>
>> I think using IB_USER_VERBS_CMD_THRESHOLD is clearer, but I don't
>> think we need two uverbs_cmd_mask vendor variables.
>> IMHO, a vendor shouldn't care if it's an extended/legacy uverb
>> command. Maybe we should replace uverbs_cmd_mask with a bitmap.
>>
>> > +   mask = ib_dev->uverbs_cmd_mask;
>> > +   else
>> > +   mask = ib_dev->uverbs_ex_cmd_mask;
>> > +
>> > +   if (mask & ((u64)1 << command))
>> > +   return 0;
>> > +
>> > +   return -1;
>> > +}
>> > +
>
> But IB_USER_VERBS_CMD_OPEN_QP is the last legacy verbs and I prefer a
> more restrictive approach to avoid potentail issues in the future.
>

So maybe it's worth adding a IB_USER_VERBS_CMD_LAST_LEGACY_VERB.
It looks a bit weird why you explicitly check for IB_USER_VERBS_CMD_OPEN_QP.

>> >  static ssize_t ib_uverbs_write(struct file *filp, const char __user *buf,
>> >  size_t count, loff_t *pos)
>> >  {
>> > @@ -704,6 +719,10 @@ static ssize_t ib_uverbs_write(struct file *filp, 
>> > const char __user *buf,
>> > }
>> >
>> > command = hdr.command & IB_USER_VERBS_CMD_COMMAND_MASK;
>> > +   if (verify_command_mask(ib_dev, command)) {
>> > +   ret = -EINVAL;
>>
>> Why did you replace ENOSYS with EINVAL?
>>
>
> Like Haggai mentioned in the other response, checkpatch issues error
> on this claiming that ENOSYS is reserved to unavailable system calls.
> Is it applicable only for new implementations I am not sure. I don't
> have clear preference for either ENOSYS or EINAVL.

I think it could break old applications:
err = extended_verb(the_first_extension_we_added);
if (err == ENOSYS)
  err = legacy_verb();
if (err)
return err;

Such applications used the first extension (that was added during the
addition of the extended verb) and when they realized it's not
supported, they dropped to the legacy verb. This change can now cause
the return of -EINVAL an early termination with an error.

>> > +   goto out;
>> > +   }
>> >
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next 4/4] IB/mlx5: Mmap the HCA's core clock register to user-space

2015-11-09 Thread Eli Cohen
On Mon, Nov 09, 2015 at 06:30:57PM +0200, Matan Barak wrote:
> In order to read the HCA's current cycles register, we need
> to map it to user-space. Add support to map this register
> via mmap command.
> 
> Signed-off-by: Matan Barak 

Acked-by: Eli Cohen 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next 1/4] IB/mlx5: Add create_cq extended command

2015-11-09 Thread Eli Cohen
On Mon, Nov 09, 2015 at 06:30:54PM +0200, Matan Barak wrote:
> In order to create a CQ that supports timestamp, mlx5 needs to
> support the extended create CQ command with the timestamp flag.
> 
> Signed-off-by: Matan Barak 

Acked-by: Eli Cohen 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next 1/4] IB/mlx5: Add create_cq extended command

2015-11-09 Thread Eli Cohen
On Mon, Nov 09, 2015 at 06:30:54PM +0200, Matan Barak wrote:
> In order to create a CQ that supports timestamp, mlx5 needs to
> support the extended create CQ command with the timestamp flag.
> 
i> Signed-off-by: Matan Barak 

Acked-by: Eli Cohen 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH ib-next 2/3] IB/core: IB/core: Allow legacy verbs through extended interfaces

2015-11-09 Thread Eli Cohen
On Mon, Nov 09, 2015 at 06:11:49PM +0200, Matan Barak wrote:
> >
> > Like Haggai mentioned in the other response, checkpatch issues error
> > on this claiming that ENOSYS is reserved to unavailable system calls.
> > Is it applicable only for new implementations I am not sure. I don't
> > have clear preference for either ENOSYS or EINAVL.
> 
> I think it could break old applications:
> err = extended_verb(the_first_extension_we_added);
> if (err == ENOSYS)
>   err = legacy_verb();
> if (err)
> return err;

Can you send a pointer to the code where this could happen?

> 
> Such applications used the first extension (that was added during the
> addition of the extended verb) and when they realized it's not
> supported, they dropped to the legacy verb. This change can now cause
> the return of -EINVAL an early termination with an error.
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH for-next 0/4] User-space time-stamping support for mlx5_ib

2015-11-09 Thread Matan Barak
Hi Eli,

This patch-set adds user-space support for time-stamping in mlx5_ib.
It implements the necessary API:
(a) ib_create_cq_ex - Add support for CQ creation flags
(b) ib_query_device - return timestamp_mask and hca_core_clock.

We also add support for mmaping the HCA's free running clock.
In order to do so, we use the response of the vendor's extended
part in ib_query_device. This allows us to pass the page offset
of the free running clock register to the user-space driver.
In order to implement it in a future extensible manner, we  use the
same mechanism of verbs extensions to the mlx5 vendor part as well.

Regards,
Matan



Matan Barak (4):
  IB/mlx5: Add create_cq extended command
  IB/core: Add ib_is_udata_cleared
  IB/mlx5: Add support querying timestamp related fields in query_device
  IB/mlx5: Mmap the HCA's core clock register to user-space

 drivers/infiniband/hw/mlx5/cq.c  |  7 
 drivers/infiniband/hw/mlx5/main.c| 69 ++--
 drivers/infiniband/hw/mlx5/mlx5_ib.h | 22 +++-
 include/linux/mlx5/device.h  | 10 +++---
 include/linux/mlx5/mlx5_ifc.h|  9 +++--
 include/rdma/ib_verbs.h  | 20 +++
 6 files changed, 125 insertions(+), 12 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH for-next 3/4] IB/mlx5: Add support querying timestamp related fields in query_device

2015-11-09 Thread Matan Barak
Add support for querying hca_core_lock, timestmap_mask and
hca_core_clock_offset in query device verb. This is necessary
in order to support completion timestamp and querying the
HCA's core clock.

Signed-off-by: Matan Barak 
---
 drivers/infiniband/hw/mlx5/main.c| 42 ++--
 drivers/infiniband/hw/mlx5/mlx5_ib.h | 18 
 include/linux/mlx5/device.h  | 10 -
 include/linux/mlx5/mlx5_ifc.h|  9 +---
 4 files changed, 69 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 05f00da..b8be3ad 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -213,10 +213,33 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
int max_rq_sg;
int max_sq_sg;
u64 min_page_size = 1ull << MLX5_CAP_GEN(mdev, log_pg_sz);
+   struct mlx5_uverbs_ex_query_device cmd = {};
+   struct mlx5_uverbs_ex_query_device_resp resp = {};
 
-   if (uhw->inlen || uhw->outlen)
-   return -EINVAL;
+   if (uhw->inlen) {
+   if (uhw->inlen < offsetof(struct mlx5_uverbs_ex_query_device,
+ comp_mask) +
+sizeof(cmd.comp_mask))
+   return -EINVAL;
+
+   err = ib_copy_from_udata(, uhw, min(sizeof(cmd),
+   uhw->inlen));
+   if (err)
+   return err;
+
+   if (cmd.comp_mask)
+   return -EINVAL;
+
+   if (cmd.reserved)
+   return -EINVAL;
 
+   if (!ib_is_udata_cleared(uhw, '\0', sizeof(cmd),
+sizeof(cmd) - uhw->inlen))
+   return -EINVAL;
+   }
+
+   resp.response_length = offsetof(typeof(resp), response_length) +
+   sizeof(resp.response_length);
memset(props, 0, sizeof(*props));
err = mlx5_query_system_image_guid(ibdev,
   >sys_image_guid);
@@ -293,6 +316,8 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
props->max_total_mcast_qp_attach = props->max_mcast_qp_attach *
   props->max_mcast_grp;
props->max_map_per_fmr = INT_MAX; /* no limit in ConnectIB */
+   props->hca_core_clock = MLX5_CAP_GEN(mdev, device_frequency_khz);
+   props->timestamp_mask = 0xFFULL;
 
 #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
if (MLX5_CAP_GEN(mdev, pg))
@@ -300,6 +325,19 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
props->odp_caps = dev->odp_caps;
 #endif
 
+   if (field_avail(typeof(resp), hca_core_clock_offset, uhw->outlen)) {
+   resp.response_length += sizeof(resp.hca_core_clock_offset);
+   resp.comp_mask |= QUERY_DEVICE_RESP_MASK_TIMESTAMP;
+   resp.hca_core_clock_offset =
+   offsetof(struct mlx5_init_seg, internal_timer_h) % 
PAGE_SIZE;
+   }
+
+   if (uhw->outlen) {
+   err = ib_copy_to_udata(uhw, , resp.response_length);
+   if (err)
+   return err;
+   }
+
return 0;
 }
 
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 22123b7..c33cf8f 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -55,6 +55,9 @@ pr_err("%s:%s:%d:(pid %d): " format, (dev)->ib_dev.name, 
__func__,\
 pr_warn("%s:%s:%d:(pid %d): " format, (dev)->ib_dev.name, __func__,\
__LINE__, current->pid, ##arg)
 
+#define field_avail(type, fld, sz) (offsetof(type, fld) +  \
+   sizeof(((type *)0)->fld) <= (sz))
+
 enum {
MLX5_IB_MMAP_CMD_SHIFT  = 8,
MLX5_IB_MMAP_CMD_MASK   = 0xff,
@@ -441,6 +444,21 @@ struct mlx5_ib_dev {
 #endif
 };
 
+struct mlx5_uverbs_ex_query_device {
+   __u32 comp_mask;
+   __u32 reserved;
+};
+
+enum query_device_resp_mask {
+   QUERY_DEVICE_RESP_MASK_TIMESTAMP = 1UL << 0,
+};
+
+struct mlx5_uverbs_ex_query_device_resp {
+   __u32 comp_mask;
+   __u32 response_length;
+   __u64 hca_core_clock_offset;
+};
+
 static inline struct mlx5_ib_cq *to_mibcq(struct mlx5_core_cq *mcq)
 {
return container_of(mcq, struct mlx5_ib_cq, mcq);
diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index 250b1ff..b7d7471 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -441,12 +441,12 @@ struct mlx5_init_seg {
__be32  cmd_dbell;
__be32  rsvd1[121];
struct health_bufferhealth;
-   __be32  rsvd2[884];
+   __be32  rsvd2[880];
+   __be32  internal_timer_h;
+  

[PATCH for-next 1/4] IB/mlx5: Add create_cq extended command

2015-11-09 Thread Matan Barak
In order to create a CQ that supports timestamp, mlx5 needs to
support the extended create CQ command with the timestamp flag.

Signed-off-by: Matan Barak 
---
 drivers/infiniband/hw/mlx5/cq.c   | 7 +++
 drivers/infiniband/hw/mlx5/main.c | 3 ++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index 2d0dbbf..674f857 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -743,6 +743,10 @@ static void destroy_cq_kernel(struct mlx5_ib_dev *dev, 
struct mlx5_ib_cq *cq)
mlx5_db_free(dev->mdev, >db);
 }
 
+enum {
+   CQ_CREATE_FLAGS_SUPPORTED = IB_CQ_FLAGS_TIMESTAMP_COMPLETION
+};
+
 struct ib_cq *mlx5_ib_create_cq(struct ib_device *ibdev,
const struct ib_cq_init_attr *attr,
struct ib_ucontext *context,
@@ -766,6 +770,9 @@ struct ib_cq *mlx5_ib_create_cq(struct ib_device *ibdev,
if (entries < 0)
return ERR_PTR(-EINVAL);
 
+   if (attr->flags & ~CQ_CREATE_FLAGS_SUPPORTED)
+   return ERR_PTR(-EINVAL);
+
entries = roundup_pow_of_two(entries + 1);
if (entries > (1 << MLX5_CAP_GEN(dev->mdev, log_max_cq_sz)))
return ERR_PTR(-EINVAL);
diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index f1ccd40..05f00da 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1385,7 +1385,8 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
(1ull << IB_USER_VERBS_CMD_CREATE_XSRQ) |
(1ull << IB_USER_VERBS_CMD_OPEN_QP);
dev->ib_dev.uverbs_ex_cmd_mask =
-   (1ull << IB_USER_VERBS_EX_CMD_QUERY_DEVICE);
+   (1ull << IB_USER_VERBS_EX_CMD_QUERY_DEVICE) |
+   (1ull << IB_USER_VERBS_EX_CMD_CREATE_CQ);
 
dev->ib_dev.query_device= mlx5_ib_query_device;
dev->ib_dev.query_port  = mlx5_ib_query_port;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH for-next 4/4] IB/mlx5: Mmap the HCA's core clock register to user-space

2015-11-09 Thread Matan Barak
In order to read the HCA's current cycles register, we need
to map it to user-space. Add support to map this register
via mmap command.

Signed-off-by: Matan Barak 
---
 drivers/infiniband/hw/mlx5/main.c| 24 
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  4 +++-
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index b8be3ad..23cedb4 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -825,6 +825,30 @@ static int mlx5_ib_mmap(struct ib_ucontext *ibcontext, 
struct vm_area_struct *vm
case MLX5_IB_MMAP_GET_CONTIGUOUS_PAGES:
return -ENOSYS;
 
+   case MLX5_IB_MMAP_CORE_CLOCK:
+   {
+   phys_addr_t pfn;
+
+   if (vma->vm_end - vma->vm_start != PAGE_SIZE)
+   return -EINVAL;
+
+   if (vma->vm_flags & (VM_WRITE | VM_EXEC))
+   return -EINVAL;
+
+   vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
+   pfn = (dev->mdev->iseg_base +
+  offsetof(struct mlx5_init_seg, internal_timer_h)) >>
+   PAGE_SHIFT;
+   if (io_remap_pfn_range(vma, vma->vm_start, pfn,
+  PAGE_SIZE, vma->vm_page_prot))
+   return -EAGAIN;
+
+   mlx5_ib_dbg(dev, "mapped internal timer at 0x%lx, PA 0x%llx\n",
+   vma->vm_start,
+   (unsigned long long)pfn << PAGE_SHIFT);
+   break;
+   }
+
default:
return -EINVAL;
}
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index c33cf8f..87cd616 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -65,7 +65,9 @@ enum {
 
 enum mlx5_ib_mmap_cmd {
MLX5_IB_MMAP_REGULAR_PAGE   = 0,
-   MLX5_IB_MMAP_GET_CONTIGUOUS_PAGES   = 1, /* always last */
+   MLX5_IB_MMAP_GET_CONTIGUOUS_PAGES   = 1,
+   /* 5 is chosen in order to be compatible with old versions of libmlx5 */
+   MLX5_IB_MMAP_CORE_CLOCK = 5,
 };
 
 enum {
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH for-next 2/4] IB/core: Add ib_is_udata_cleared

2015-11-09 Thread Matan Barak
Extending core and vendor verb commands require us to check that the
unknown part of the user's given command is all zeros.
Adding ib_is_udata_cleared in order to do so.

Signed-off-by: Matan Barak 
---
 include/rdma/ib_verbs.h | 20 
 1 file changed, 20 insertions(+)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index e4cc389..43f3cf2 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1855,6 +1855,26 @@ static inline int ib_copy_to_udata(struct ib_udata 
*udata, void *src, size_t len
return copy_to_user(udata->outbuf, src, len) ? -EFAULT : 0;
 }
 
+static inline bool ib_is_udata_cleared(struct ib_udata *udata,
+  char cleared_char,
+  size_t offset,
+  size_t len)
+{
+   short i;
+
+   for (i = 0; i < len; i++) {
+   char c;
+
+   if (copy_from_user(, udata->inbuf + offset + i, sizeof(c)))
+   return false;
+
+   if (c != cleared_char)
+   return false;
+   }
+
+   return true;
+}
+
 /**
  * ib_modify_qp_is_ok - Check that the supplied attribute mask
  * contains all required attributes and no attributes not allowed for
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH ib-next 2/3] IB/core: IB/core: Allow legacy verbs through extended interfaces

2015-11-09 Thread Matan Barak
On Mon, Nov 9, 2015 at 6:25 PM, Eli Cohen  wrote:
> On Mon, Nov 09, 2015 at 06:11:49PM +0200, Matan Barak wrote:
>> >
>> > Like Haggai mentioned in the other response, checkpatch issues error
>> > on this claiming that ENOSYS is reserved to unavailable system calls.
>> > Is it applicable only for new implementations I am not sure. I don't
>> > have clear preference for either ENOSYS or EINAVL.
>>
>> I think it could break old applications:
>> err = extended_verb(the_first_extension_we_added);
>> if (err == ENOSYS)
>>   err = legacy_verb();
>> if (err)
>> return err;
>
> Can you send a pointer to the code where this could happen?
>

This is a hypothetical application code that could break.

>>
>> Such applications used the first extension (that was added during the
>> addition of the extended verb) and when they realized it's not
>> supported, they dropped to the legacy verb. This change can now cause
>> the return of -EINVAL an early termination with an error.
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next 3/4] IB/mlx5: Add support querying timestamp related fields in query_device

2015-11-09 Thread Leon Romanovsky
On Mon, Nov 09, 2015 at 06:30:56PM +0200, Matan Barak wrote:
> +
> + if (uhw->outlen) {
> + err = ib_copy_to_udata(uhw, , resp.response_length);
> + if (err)
> + return err;
> + }
> +
>   return 0;
What do you think about to rewrite this part of code to be something
like that?
+   int ret = 0;
.
+   if (uhw->outlen)
+   ret = ib_copy_to_udata(uhw, , resp.response_length);
+   return ret;
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next 2/4] IB/core: Add ib_is_udata_cleared

2015-11-09 Thread Leon Romanovsky
On Mon, Nov 09, 2015 at 06:30:55PM +0200, Matan Barak wrote:
>  
> +static inline bool ib_is_udata_cleared(struct ib_udata *udata,
> +char cleared_char,
> +size_t offset,
> +size_t len)
> +{
> + short i;
> +
> + for (i = 0; i < len; i++) {
You are comparing "len" which is declared as size_t which is "unsigned" int and 
"i" which is "short".
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH ib-next 2/3] IB/core: IB/core: Allow legacy verbs through extended interfaces

2015-11-09 Thread Jason Gunthorpe
On Tue, Nov 10, 2015 at 01:24:31AM +0200, Eli Cohen wrote:
> > Also, when the driver tests the ex flags for support it should be
> > returning EOPNOTSUPP or such not EINVAL.. Return codes for the ex
> > stuff could stand a good sanity audit.
> > 
> 
> #define EOPNOTSUPP  95  /* Operation not supported on
> transport endpoint */
> 
> This does not seem like an ideal choise either. I think ENOSYS in this
> case is a better choise.

Call it ENOTSUP then:

   ENOTSUP Operation not supported (POSIX.1)

Same value on Linux.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 3/3] staging/rdma/hfi1: Method to toggle "fast ECN" detection

2015-11-09 Thread ira . weiny
From: Vennila Megavannan 

Add a module paramter to toggle prescan/Fast ECN Detection and remove the
Kconfig option which used to control this.

Reviewed-by: Arthur Kepner
Reviewed-by: Mike Marciniszyn
Signed-off-by: Vennila Megavannan
Signed-off-by: Ira Weiny 

---
Changes from V1:
Redo commit message as well as Kconfig help to make it clear what the
compile and module options do.

Changes from V2:
Remove Kconfig option completely

 drivers/staging/rdma/hfi1/Kconfig  | 10 --
 drivers/staging/rdma/hfi1/driver.c | 24 ++--
 2 files changed, 14 insertions(+), 20 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/Kconfig 
b/drivers/staging/rdma/hfi1/Kconfig
index fd25078ee923..0ad92fdb3fe6 100644
--- a/drivers/staging/rdma/hfi1/Kconfig
+++ b/drivers/staging/rdma/hfi1/Kconfig
@@ -25,13 +25,3 @@ config SDMA_VERBOSITY
---help---
This is a configuration flag to enable verbose
SDMA debug
-config PRESCAN_RXQ
-   bool "Enable prescanning of the RX queue for ECNs"
-   depends on INFINIBAND_HFI1
-   default n
-   ---help---
-   This option toggles the prescanning of the receive queue for
-   Explicit Congestion Notifications. If an ECN is detected, it
-   is processed as quickly as possible, the ECN is toggled off.
-   After the prescanning step, the receive queue is processed as
-   usual.
diff --git a/drivers/staging/rdma/hfi1/driver.c 
b/drivers/staging/rdma/hfi1/driver.c
index 9a4ec09af020..145ac3061f5d 100644
--- a/drivers/staging/rdma/hfi1/driver.c
+++ b/drivers/staging/rdma/hfi1/driver.c
@@ -83,6 +83,12 @@ unsigned int hfi1_cu = 1;
 module_param_named(cu, hfi1_cu, uint, S_IRUGO);
 MODULE_PARM_DESC(cu, "Credit return units");
 
+static unsigned int prescan_rx_queue;
+module_param_named(prescan_rxq, prescan_rx_queue, uint,
+  S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(prescan_rxq,
+"Used to toggle rx prescan. Set to 1 to enable prescan");
+
 unsigned long hfi1_cap_mask = HFI1_CAP_MASK_DEFAULT;
 static int hfi1_caps_set(const char *, const struct kernel_param *);
 static int hfi1_caps_get(char *, const struct kernel_param *);
@@ -434,11 +440,6 @@ static inline void init_packet(struct hfi1_ctxtdata *rcd,
packet->rcv_flags = 0;
 }
 
-#ifndef CONFIG_PRESCAN_RXQ
-static void prescan_rxq(struct hfi1_packet *packet) {}
-#else /* !CONFIG_PRESCAN_RXQ */
-static int prescan_receive_queue;
-
 static void process_ecn(struct hfi1_qp *qp, struct hfi1_ib_header *hdr,
struct hfi1_other_headers *ohdr,
u64 rhf, u32 bth1, struct ib_grh *grh)
@@ -541,15 +542,19 @@ static inline void update_ps_mdata(struct ps_mdata *mdata)
  * containing Excplicit Congestion Notifications (FECNs, or BECNs).
  * When an ECN is found, process the Congestion Notification, and toggle
  * it off.
+ * This is declared as a macro to allow quick checking of the module param and
+ * avoid the overhead of a function call if not enabled.
  */
-static void prescan_rxq(struct hfi1_packet *packet)
+#define prescan_rxq(packet) \
+   do { \
+   if (prescan_rx_queue) \
+   __prescan_rxq(packet); \
+   } while (0)
+static void __prescan_rxq(struct hfi1_packet *packet)
 {
struct hfi1_ctxtdata *rcd = packet->rcd;
struct ps_mdata mdata;
 
-   if (!prescan_receive_queue)
-   return;
-
init_ps_mdata(, packet);
 
while (1) {
@@ -609,7 +614,6 @@ next:
update_ps_mdata();
}
 }
-#endif /* CONFIG_PRESCAN_RXQ */
 
 static inline int process_rcv_packet(struct hfi1_packet *packet, int thread)
 {
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next 1/4] IB/mlx5: Add create_cq extended command

2015-11-09 Thread Jason Gunthorpe
On Mon, Nov 09, 2015 at 06:30:54PM +0200, Matan Barak wrote:
> @@ -1385,7 +1385,8 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
>   (1ull << IB_USER_VERBS_CMD_CREATE_XSRQ) |
>   (1ull << IB_USER_VERBS_CMD_OPEN_QP);
>   dev->ib_dev.uverbs_ex_cmd_mask =
> - (1ull << IB_USER_VERBS_EX_CMD_QUERY_DEVICE);
> + (1ull << IB_USER_VERBS_EX_CMD_QUERY_DEVICE) |
> + (1ull << IB_USER_VERBS_EX_CMD_CREATE_CQ);

Eli posted a series that gets rid of this stuff, can you please
coordinate?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH ib-next 2/3] IB/core: IB/core: Allow legacy verbs through extended interfaces

2015-11-09 Thread Eli Cohen
On Mon, Nov 09, 2015 at 04:15:42PM -0700, Jason Gunthorpe wrote:
> 
> Since the change is to make the kernel do the above fall back
> internally, this specific example doesn't make alot of sense to worry
> about. Ie the extended verb won't fail anymore, and if it does the
> legacy one won't work anyhow.
> 

Makes sense.

> But if there is something out there that does care about ENOSYS we
> should try to keep it, but don't convert ENOSYS to EINVAL.
> 
> Also, when the driver tests the ex flags for support it should be
> returning EOPNOTSUPP or such not EINVAL.. Return codes for the ex
> stuff could stand a good sanity audit.
> 

#define EOPNOTSUPP  95  /* Operation not supported on
transport endpoint */

This does not seem like an ideal choise either. I think ENOSYS in this
case is a better choise.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] staging/rdma/hfi1: set Gen3 half-swing for integrated devices

2015-11-09 Thread ira . weiny
From: Dean Luick 

Correctly set half-swing for integrated devices.  A0 needs all fields set for
CcePcieCtrl.  B0 and later only need a few fields set.

Reviewed-by: Stuart Summers 
Signed-off-by: Dean Luick 
Signed-off-by: Ira Weiny 

---
Changes from V1:
Add comments concerning the very long names.

 drivers/staging/rdma/hfi1/chip_registers.h | 16 +++
 drivers/staging/rdma/hfi1/pcie.c   | 77 --
 2 files changed, 89 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/chip_registers.h 
b/drivers/staging/rdma/hfi1/chip_registers.h
index bf45de29d8bd..e4c1953203dd 100644
--- a/drivers/staging/rdma/hfi1/chip_registers.h
+++ b/drivers/staging/rdma/hfi1/chip_registers.h
@@ -51,6 +51,11 @@
  *
  */
 
+/*
+ * NOTE: The names in this file are very long but they are meant to track the
+ * hardware specification names.
+ */
+
 #define CORE   0x
 #define CCE(CORE + 0x)
 #define ASIC   (CORE + 0x0040)
@@ -549,6 +554,17 @@
 #define CCE_MSIX_TABLE_UPPER (CCE + 0x0018)
 #define CCE_MSIX_TABLE_UPPER_RESETCSR 0x0001ull
 #define CCE_MSIX_VEC_CLR_WITHOUT_INT (CCE + 0x00110400)
+#define CCE_PCIE_CTRL (CCE + 0x00C0)
+#define CCE_PCIE_CTRL_PCIE_LANE_BUNDLE_MASK 0x3ull
+#define CCE_PCIE_CTRL_PCIE_LANE_BUNDLE_SHIFT 0
+#define CCE_PCIE_CTRL_PCIE_LANE_DELAY_MASK 0xFull
+#define CCE_PCIE_CTRL_PCIE_LANE_DELAY_SHIFT 2
+#define CCE_PCIE_CTRL_XMT_MARGIN_OVERWRITE_ENABLE_SHIFT 8
+#define CCE_PCIE_CTRL_XMT_MARGIN_SHIFT 9
+#define CCE_PCIE_CTRL_XMT_MARGIN_GEN1_GEN2_OVERWRITE_ENABLE_MASK 0x1ull
+#define CCE_PCIE_CTRL_XMT_MARGIN_GEN1_GEN2_OVERWRITE_ENABLE_SHIFT 12
+#define CCE_PCIE_CTRL_XMT_MARGIN_GEN1_GEN2_MASK 0x7ull
+#define CCE_PCIE_CTRL_XMT_MARGIN_GEN1_GEN2_SHIFT 13
 #define CCE_REVISION (CCE + 0x)
 #define CCE_REVISION2 (CCE + 0x0008)
 #define CCE_REVISION2_HFI_ID_MASK 0x1ull
diff --git a/drivers/staging/rdma/hfi1/pcie.c b/drivers/staging/rdma/hfi1/pcie.c
index a956044459a2..fe17332e9239 100644
--- a/drivers/staging/rdma/hfi1/pcie.c
+++ b/drivers/staging/rdma/hfi1/pcie.c
@@ -865,6 +865,78 @@ static void arm_gasket_logic(struct hfi1_devdata *dd)
 }
 
 /*
+ * CcePcieCtrl long name helper
+ * Due to checkpatch checks we use this macro to shorten field macros names
+ */
+#define PC(field) (CCE_PCIE_CTRL_##field)
+
+ /*
+  * Write xmt_margin for full-swing (WFR-B) or half-swing (WFR-C).
+  */
+static void write_xmt_margin(struct hfi1_devdata *dd, const char *fname)
+{
+   u64 pcie_ctrl;
+   u64 xmt_margin;
+   u64 xmt_margin_oe;
+   u64 lane_delay;
+   u64 lane_bundle;
+
+   pcie_ctrl = read_csr(dd, CCE_PCIE_CTRL);
+
+   /*
+* For Discrete, use full-swing.
+*  - PCIe TX defaults to full-swing.
+*Leave this register as default.
+* For Integrated, use half-swing
+*  - Copy xmt_margin and xmt_margin_oe
+*from Gen1/Gen2 to Gen3.
+*/
+   if (dd->pcidev->device == PCI_DEVICE_ID_INTEL1) { /* integrated */
+   /* extract initial fields */
+   xmt_margin = (pcie_ctrl >> PC(XMT_MARGIN_GEN1_GEN2_SHIFT))
+   & PC(XMT_MARGIN_GEN1_GEN2_MASK);
+   xmt_margin_oe =
+   (pcie_ctrl
+ >> PC(XMT_MARGIN_GEN1_GEN2_OVERWRITE_ENABLE_SHIFT))
+   & PC(XMT_MARGIN_GEN1_GEN2_OVERWRITE_ENABLE_MASK);
+   lane_delay = (pcie_ctrl >> PC(PCIE_LANE_DELAY_SHIFT))
+   & PC(PCIE_LANE_DELAY_MASK);
+   lane_bundle = (pcie_ctrl >> PC(PCIE_LANE_BUNDLE_SHIFT))
+   & PC(PCIE_LANE_BUNDLE_MASK);
+
+   /*
+* For A0, EFUSE values are not set.  Override with the
+* correct values.
+*/
+   if (is_a0(dd)) {
+   /*
+* xmt_margin and OverwiteEnabel should be the
+* same for Gen1/Gen2 and Gen3
+*/
+   xmt_margin= 0x5;
+   xmt_margin_oe = 0x1;
+   lane_delay= 0xF; /* Delay 240ns. */
+   lane_bundle   = 0x0; /* Set to 1 lane. */
+   }
+
+   /* overwrite existing values */
+   pcie_ctrl = (xmt_margin << PC(XMT_MARGIN_GEN1_GEN2_SHIFT))
+   | (xmt_margin_oe <<
+   PC(XMT_MARGIN_GEN1_GEN2_OVERWRITE_ENABLE_SHIFT))
+   | (xmt_margin << PC(XMT_MARGIN_SHIFT))
+   | (xmt_margin_oe <<
+   PC(XMT_MARGIN_OVERWRITE_ENABLE_SHIFT))
+   | (lane_delay << PC(PCIE_LANE_DELAY_SHIFT))
+   

Re: [PATCH RESEND 11/11] staging/rdma/hfi1: Workaround to prevent corruption during packet delivery

2015-11-09 Thread Jubin John
On Fri, Nov 06, 2015 at 07:25:31PM -0800, Greg KH wrote:
> On Fri, Nov 06, 2015 at 08:07:06PM -0500, Jubin John wrote:
> > --- a/drivers/staging/rdma/hfi1/hfi.h
> > +++ b/drivers/staging/rdma/hfi1/hfi.h
> > @@ -1084,6 +1084,10 @@ struct hfi1_devdata {
> > /* Save the enabled LCB error bits */
> > u64 lcb_err_en;
> > u8 dc_shutdown;
> > +
> > +   /* receive context tail dummy address */
> > +   volatile __le64 *rcvhdrtail_dummy_kvaddr;
> 
> There's no way that volatile here is actually correct, please fix.
> 
You're right. Will fix in v2.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 6/9] staging/rdma/hfi1: Implement Expected Receive TID caching

2015-11-09 Thread ira.weiny
On Fri, Nov 06, 2015 at 05:03:28PM -0800, Greg KH wrote:
> On Fri, Oct 30, 2015 at 06:58:45PM -0400, ira.we...@intel.com wrote:
> > From: Mitko Haralanov 
> > 
> > Expected receives work by user-space libraries (PSM) calling into the driver
> > with information about the user's receive buffer and have the driver DMA-map
> > that buffer and program the HFI to receive data directly into it.
> > 
> > This is an expensive operation as it requires the driver to pin the pages 
> > which
> > the user's buffer maps to, DMA-map them, and then program the HFI.
> > 
> > When the receive is complete, user-space libraries have to call into the 
> > driver
> > again so the buffer is removed from the HFI, un-mapped, and the pages 
> > unpinned.
> > 
> > All of these operations are expensive, considering that a lot of 
> > applications
> > (especially micro-benchmarks) use the same buffer over and over.
> > 
> > In order to get better performance for user-space applications, it is highly
> > beneficial that they don't continuously call into the driver to register and
> > unregister the same buffer. Rather, they can register the buffer and cache 
> > it
> > for future work. The buffer can be unregistered when it is freed by the 
> > user.
> > 
> > This change implements such buffer caching by making use of the kernel's MMU
> > notifier API. User-space libraries call into the driver only when the need 
> > to
> > register a new buffer.
> > 
> > Once a buffer is registered, it stays programmed into the HFI until the 
> > kernel
> > notifies the driver that the buffer has been freed by the user. At that 
> > time,
> > the user-space library is notified and it can do the necessary work to 
> > remove
> > the buffer from its cache.
> > 
> > Buffers which have been invalidated by the kernel are not automatically 
> > removed
> > from the HFI and do not have their pages unpinned. Buffers are only 
> > completely
> > removed when the user-space libraries call into the driver to free them.  
> > This
> > is done to ensure that any ongoing transfers into that buffer are complete.
> > This is important when a buffer is not completely freed but rather it is
> > shrunk. The user-space library could still have uncompleted transfers into 
> > the
> > remaining buffer.
> > 
> > With this feature, it is important that systems are setup with reasonable
> > limits for the amount of lockable memory.  Keeping the limit at "unlimited" 
> > (as
> > we've done up to this point), may result in jobs being killed by the 
> > kernel's
> > OOM due to them taking up excessive amounts of memory.
> > 
> > Reviewed-by: Arthur Kepner 
> > Reviewed-by: Dennis Dalessandro 
> > Signed-off-by: Mitko Haralanov 
> > Signed-off-by: Ira Weiny 
> > 
> > ---
> > Changes from V3:
> > Reworked based on the removal of the file pointer macros
> > Split out some prep patches and code clean up
> > 
> > Changes from V2:
> > Fix random Kconfig 0-day build error
> > Fix leak of random memory to user space caught by Dan Carpenter
> > Separate out pointer bug fix into a previous patch
> > Change error checks in case statement per Dan's comments
> > 
> >  drivers/staging/rdma/hfi1/file_ops.c | 469 ++---
> >  drivers/staging/rdma/hfi1/hfi.h  |  43 +-
> >  drivers/staging/rdma/hfi1/init.c |   5 +-
> >  drivers/staging/rdma/hfi1/trace.h| 132 +++--
> >  drivers/staging/rdma/hfi1/user_exp_rcv.c | 874 
> > ++-
> >  drivers/staging/rdma/hfi1/user_pages.c   | 110 +---
> >  drivers/staging/rdma/hfi1/user_sdma.c|  13 +
> >  include/uapi/rdma/hfi/hfi1_user.h|  14 +-
> >  8 files changed, 1069 insertions(+), 591 deletions(-)
> 
> This is still a really big patch, any chance you can break it up into
> smaller, reviewable parts?  I see you add different operations, perhaps
> break it up into one patch per logical thing?
> 

I understand, however, as you have seen with my attempt to break this up there
are issues if we do so.

I need to clean up the previous patch which was an attempt to split this one
up.  But at this point I would really like to preserve the functionality we
have here.  Breaking this up beyond this point is going to be difficult to do
and really will not allow for bisecting the code across this feature being
in vs out.

Thanks,
Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RESEND PATCH v4 2/3] staging/rdma/hfi1: Use parallel workqueue for SDMA engines

2015-11-09 Thread ira . weiny
From: Mike Marciniszyn 

The workqueue is currently single threaded per port which for a small number of
SDMA engines is ok.

For hfi1, the there are up to 16 SDMA engines that can be fed descriptors in
parallel.

Use alloc_workqueue with a workqueue limit of the number of sdma engines and
with WQ_CPU_INTENSIVE and WQ_HIGHPRI specified.

Then change send to use the new scheduler which no longer needs to get the
s_lock

Reviewed-by: Dennis Dalessandro 
Signed-off-by: Mike Marciniszyn 
Signed-off-by: Ira Weiny 

---
Changes from v3:
Split patch out code refactoring and separate functionality

This patch was previously sent as #8/9 but 5/9 failed.

I'm resending 7,8, and 9 as they did not explicitly depend on 5 and 6.

 drivers/staging/rdma/hfi1/chip.c   |  1 +
 drivers/staging/rdma/hfi1/init.c   | 13 ++---
 drivers/staging/rdma/hfi1/iowait.h |  6 --
 drivers/staging/rdma/hfi1/qp.h | 35 +++
 drivers/staging/rdma/hfi1/sdma.c   |  8 +---
 drivers/staging/rdma/hfi1/sdma.h   |  8 +---
 drivers/staging/rdma/hfi1/verbs.c  | 20 
 drivers/staging/rdma/hfi1/verbs.h  |  1 -
 8 files changed, 60 insertions(+), 32 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/chip.c b/drivers/staging/rdma/hfi1/chip.c
index e48981994b10..b7c7d1cac4e1 100644
--- a/drivers/staging/rdma/hfi1/chip.c
+++ b/drivers/staging/rdma/hfi1/chip.c
@@ -9018,6 +9018,7 @@ static int request_msix_irqs(struct hfi1_devdata *dd)
if (handler == sdma_interrupt) {
dd_dev_info(dd, "sdma engine %d cpu %d\n",
sde->this_idx, sdma_cpu);
+   sde->cpu = sdma_cpu;
cpumask_set_cpu(sdma_cpu, dd->msix_entries[i].mask);
sdma_cpu = cpumask_next(sdma_cpu, def);
if (sdma_cpu >= nr_cpu_ids)
diff --git a/drivers/staging/rdma/hfi1/init.c b/drivers/staging/rdma/hfi1/init.c
index 47a1202fcbdf..69dfa3a65fc6 100644
--- a/drivers/staging/rdma/hfi1/init.c
+++ b/drivers/staging/rdma/hfi1/init.c
@@ -601,20 +601,19 @@ static int create_workqueues(struct hfi1_devdata *dd)
for (pidx = 0; pidx < dd->num_pports; ++pidx) {
ppd = dd->pport + pidx;
if (!ppd->hfi1_wq) {
-   char wq_name[8]; /* 3 + 2 + 1 + 1 + 1 */
-
-   snprintf(wq_name, sizeof(wq_name), "hfi%d_%d",
-dd->unit, pidx);
ppd->hfi1_wq =
-   create_singlethread_workqueue(wq_name);
+   alloc_workqueue(
+   "hfi%d_%d",
+   WQ_SYSFS | WQ_HIGHPRI | WQ_CPU_INTENSIVE,
+   dd->num_sdma,
+   dd->unit, pidx);
if (!ppd->hfi1_wq)
goto wq_error;
}
}
return 0;
 wq_error:
-   pr_err("create_singlethread_workqueue failed for port %d\n",
-   pidx + 1);
+   pr_err("alloc_workqueue failed for port %d\n", pidx + 1);
for (pidx = 0; pidx < dd->num_pports; ++pidx) {
ppd = dd->pport + pidx;
if (ppd->hfi1_wq) {
diff --git a/drivers/staging/rdma/hfi1/iowait.h 
b/drivers/staging/rdma/hfi1/iowait.h
index fa361b405851..e8ba5606d08d 100644
--- a/drivers/staging/rdma/hfi1/iowait.h
+++ b/drivers/staging/rdma/hfi1/iowait.h
@@ -150,12 +150,14 @@ static inline void iowait_init(
  * iowait_schedule() - initialize wait structure
  * @wait: wait struct to schedule
  * @wq: workqueue for schedule
+ * @cpu: cpu
  */
 static inline void iowait_schedule(
struct iowait *wait,
-   struct workqueue_struct *wq)
+   struct workqueue_struct *wq,
+   int cpu)
 {
-   queue_work(wq, >iowork);
+   queue_work_on(cpu, wq, >iowork);
 }
 
 /**
diff --git a/drivers/staging/rdma/hfi1/qp.h b/drivers/staging/rdma/hfi1/qp.h
index bacfa9c5e8a8..e49cfa6e59e0 100644
--- a/drivers/staging/rdma/hfi1/qp.h
+++ b/drivers/staging/rdma/hfi1/qp.h
@@ -247,6 +247,41 @@ void qp_iter_print(struct seq_file *s, struct qp_iter 
*iter);
  */
 void qp_comm_est(struct hfi1_qp *qp);
 
+/**
+ * _hfi1_schedule_send - schedule progress
+ * @qp: the QP
+ *
+ * This schedules qp progress w/o regard to the s_flags.
+ *
+ * It is only used in the post send, which doesn't hold
+ * the s_lock.
+ */
+static inline void _hfi1_schedule_send(struct hfi1_qp *qp)
+{
+   struct hfi1_ibport *ibp =
+   to_iport(qp->ibqp.device, qp->port_num);
+   struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
+   struct hfi1_devdata *dd = dd_from_ibdev(qp->ibqp.device);
+
+   iowait_schedule(>s_iowait, ppd->hfi1_wq,
+   qp->s_sde ?
+   

[RESEND PATCH v4 3/3] staging/rdma/hfi1: pre-compute sc and sde for RC/UC QPs

2015-11-09 Thread ira . weiny
From: Mike Marciniszyn 

Now that we have a multi-threaded work queue we precomputed and store the SC
and SDE on RC and UC QPs for faster access.

Reviewed-by: Dennis Dalessandro 
Signed-off-by: Mike Marciniszyn 
Signed-off-by: Ira Weiny 
---
This patch was previously sent as #9/9 but 5/9 failed.

I'm resending 7,8, and 9 as they did not explicitly depend on 5 and 6.

 drivers/staging/rdma/hfi1/qp.c| 27 +--
 drivers/staging/rdma/hfi1/qp.h|  1 -
 drivers/staging/rdma/hfi1/ruc.c   | 10 ++
 drivers/staging/rdma/hfi1/ud.c|  1 +
 drivers/staging/rdma/hfi1/verbs.c | 14 +++---
 drivers/staging/rdma/hfi1/verbs.h |  3 ++-
 6 files changed, 29 insertions(+), 27 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/qp.c b/drivers/staging/rdma/hfi1/qp.c
index 6e6c9a8f6b6f..dc8d833de3bd 100644
--- a/drivers/staging/rdma/hfi1/qp.c
+++ b/drivers/staging/rdma/hfi1/qp.c
@@ -617,7 +617,7 @@ int hfi1_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr 
*attr,
int mig = 0;
int ret;
u32 pmtu = 0; /* for gcc warning only */
-   struct hfi1_devdata *dd;
+   struct hfi1_devdata *dd = dd_from_dev(dev);
 
spin_lock_irq(>r_lock);
spin_lock(>s_lock);
@@ -631,23 +631,35 @@ int hfi1_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr 
*attr,
goto inval;
 
if (attr_mask & IB_QP_AV) {
+   u8 sc;
+
if (attr->ah_attr.dlid >= HFI1_MULTICAST_LID_BASE)
goto inval;
if (hfi1_check_ah(qp->ibqp.device, >ah_attr))
goto inval;
+   sc = ah_to_sc(ibqp->device, >ah_attr);
+   if (!qp_to_sdma_engine(qp, sc) &&
+   dd->flags & HFI1_HAS_SEND_DMA)
+   goto inval;
}
 
if (attr_mask & IB_QP_ALT_PATH) {
+   u8 sc;
+
if (attr->alt_ah_attr.dlid >= HFI1_MULTICAST_LID_BASE)
goto inval;
if (hfi1_check_ah(qp->ibqp.device, >alt_ah_attr))
goto inval;
-   if (attr->alt_pkey_index >= hfi1_get_npkeys(dd_from_dev(dev)))
+   if (attr->alt_pkey_index >= hfi1_get_npkeys(dd))
+   goto inval;
+   sc = ah_to_sc(ibqp->device, >alt_ah_attr);
+   if (!qp_to_sdma_engine(qp, sc) &&
+   dd->flags & HFI1_HAS_SEND_DMA)
goto inval;
}
 
if (attr_mask & IB_QP_PKEY_INDEX)
-   if (attr->pkey_index >= hfi1_get_npkeys(dd_from_dev(dev)))
+   if (attr->pkey_index >= hfi1_get_npkeys(dd))
goto inval;
 
if (attr_mask & IB_QP_MIN_RNR_TIMER)
@@ -792,6 +804,8 @@ int hfi1_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr 
*attr,
qp->remote_ah_attr = attr->ah_attr;
qp->s_srate = attr->ah_attr.static_rate;
qp->srate_mbps = ib_rate_to_mbps(qp->s_srate);
+   qp->s_sc = ah_to_sc(ibqp->device, >remote_ah_attr);
+   qp->s_sde = qp_to_sdma_engine(qp, qp->s_sc);
}
 
if (attr_mask & IB_QP_ALT_PATH) {
@@ -806,6 +820,8 @@ int hfi1_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr 
*attr,
qp->port_num = qp->alt_ah_attr.port_num;
qp->s_pkey_index = qp->s_alt_pkey_index;
qp->s_flags |= HFI1_S_AHG_CLEAR;
+   qp->s_sc = ah_to_sc(ibqp->device, >remote_ah_attr);
+   qp->s_sde = qp_to_sdma_engine(qp, qp->s_sc);
}
}
 
@@ -1528,9 +1544,6 @@ struct sdma_engine *qp_to_sdma_engine(struct hfi1_qp *qp, 
u8 sc5)
if (!(dd->flags & HFI1_HAS_SEND_DMA))
return NULL;
switch (qp->ibqp.qp_type) {
-   case IB_QPT_UC:
-   case IB_QPT_RC:
-   break;
case IB_QPT_SMI:
return NULL;
default:
@@ -1699,6 +1712,8 @@ void hfi1_migrate_qp(struct hfi1_qp *qp)
qp->port_num = qp->alt_ah_attr.port_num;
qp->s_pkey_index = qp->s_alt_pkey_index;
qp->s_flags |= HFI1_S_AHG_CLEAR;
+   qp->s_sc = ah_to_sc(qp->ibqp.device, >remote_ah_attr);
+   qp->s_sde = qp_to_sdma_engine(qp, qp->s_sc);
 
ev.device = qp->ibqp.device;
ev.element.qp = >ibqp;
diff --git a/drivers/staging/rdma/hfi1/qp.h b/drivers/staging/rdma/hfi1/qp.h
index e49cfa6e59e0..94fd748a00a6 100644
--- a/drivers/staging/rdma/hfi1/qp.h
+++ b/drivers/staging/rdma/hfi1/qp.h
@@ -128,7 +128,6 @@ static inline void clear_ahg(struct hfi1_qp *qp)
if (qp->s_sde && qp->s_ahgidx >= 0)
sdma_ahg_free(qp->s_sde, qp->s_ahgidx);
qp->s_ahgidx = -1;
-   qp->s_sde = NULL;
 }
 
 /**
diff --git a/drivers/staging/rdma/hfi1/ruc.c b/drivers/staging/rdma/hfi1/ruc.c
index 

[RESEND PATCH v4 1/3] staging/rdma/hfi1: move hfi1_migrate_qp

2015-11-09 Thread ira . weiny
From: Mike Marciniszyn 

Move hfi1_migrate_qp from ruc.c to qp.[hc] in prep for modifying the QP
workqueue.

Signed-off-by: Mike Marciniszyn 
Signed-off-by: Ira Weiny 
---
This patch was previously sent as #7/9 but 5/9 failed.

I'm resending 7,8, and 9 as they did not explicitly depend on 5 and 6.

 drivers/staging/rdma/hfi1/qp.c| 20 
 drivers/staging/rdma/hfi1/qp.h|  2 ++
 drivers/staging/rdma/hfi1/ruc.c   | 20 
 drivers/staging/rdma/hfi1/verbs.h |  2 --
 4 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/qp.c b/drivers/staging/rdma/hfi1/qp.c
index df1fa56eaf85..6e6c9a8f6b6f 100644
--- a/drivers/staging/rdma/hfi1/qp.c
+++ b/drivers/staging/rdma/hfi1/qp.c
@@ -1685,3 +1685,23 @@ void qp_comm_est(struct hfi1_qp *qp)
qp->ibqp.event_handler(, qp->ibqp.qp_context);
}
 }
+
+/*
+ * Switch to alternate path.
+ * The QP s_lock should be held and interrupts disabled.
+ */
+void hfi1_migrate_qp(struct hfi1_qp *qp)
+{
+   struct ib_event ev;
+
+   qp->s_mig_state = IB_MIG_MIGRATED;
+   qp->remote_ah_attr = qp->alt_ah_attr;
+   qp->port_num = qp->alt_ah_attr.port_num;
+   qp->s_pkey_index = qp->s_alt_pkey_index;
+   qp->s_flags |= HFI1_S_AHG_CLEAR;
+
+   ev.device = qp->ibqp.device;
+   ev.element.qp = >ibqp;
+   ev.event = IB_EVENT_PATH_MIG;
+   qp->ibqp.event_handler(, qp->ibqp.qp_context);
+}
diff --git a/drivers/staging/rdma/hfi1/qp.h b/drivers/staging/rdma/hfi1/qp.h
index b9c1575990aa..bacfa9c5e8a8 100644
--- a/drivers/staging/rdma/hfi1/qp.h
+++ b/drivers/staging/rdma/hfi1/qp.h
@@ -247,4 +247,6 @@ void qp_iter_print(struct seq_file *s, struct qp_iter 
*iter);
  */
 void qp_comm_est(struct hfi1_qp *qp);
 
+void hfi1_migrate_qp(struct hfi1_qp *qp);
+
 #endif /* _QP_H */
diff --git a/drivers/staging/rdma/hfi1/ruc.c b/drivers/staging/rdma/hfi1/ruc.c
index 8614b070545c..947ea27332ff 100644
--- a/drivers/staging/rdma/hfi1/ruc.c
+++ b/drivers/staging/rdma/hfi1/ruc.c
@@ -241,26 +241,6 @@ bail:
return ret;
 }
 
-/*
- * Switch to alternate path.
- * The QP s_lock should be held and interrupts disabled.
- */
-void hfi1_migrate_qp(struct hfi1_qp *qp)
-{
-   struct ib_event ev;
-
-   qp->s_mig_state = IB_MIG_MIGRATED;
-   qp->remote_ah_attr = qp->alt_ah_attr;
-   qp->port_num = qp->alt_ah_attr.port_num;
-   qp->s_pkey_index = qp->s_alt_pkey_index;
-   qp->s_flags |= HFI1_S_AHG_CLEAR;
-
-   ev.device = qp->ibqp.device;
-   ev.element.qp = >ibqp;
-   ev.event = IB_EVENT_PATH_MIG;
-   qp->ibqp.event_handler(, qp->ibqp.qp_context);
-}
-
 static __be64 get_sguid(struct hfi1_ibport *ibp, unsigned index)
 {
if (!index) {
diff --git a/drivers/staging/rdma/hfi1/verbs.h 
b/drivers/staging/rdma/hfi1/verbs.h
index e4a8a0d4ccf8..9555aab51530 100644
--- a/drivers/staging/rdma/hfi1/verbs.h
+++ b/drivers/staging/rdma/hfi1/verbs.h
@@ -1071,8 +1071,6 @@ int hfi1_mmap(struct ib_ucontext *context, struct 
vm_area_struct *vma);
 
 int hfi1_get_rwqe(struct hfi1_qp *qp, int wr_id_only);
 
-void hfi1_migrate_qp(struct hfi1_qp *qp);
-
 int hfi1_ruc_check_hdr(struct hfi1_ibport *ibp, struct hfi1_ib_header *hdr,
   int has_grh, struct hfi1_qp *qp, u32 bth0);
 
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 10/11] staging/rdma/hfi1: Eliminate WARN_ON when VL is invalid

2015-11-09 Thread Jubin John
> > -   if (WARN_ON(vl > 8))
> > -   return NULL;
> > +   /* NOTE This should only happen if SC->VL changed after the initial
> > +*  checks on the QP/AH
> > +*  Default will return engine 0 below
> > +*/
> > +   if (unlikely(vl >= num_vls)) {
> 
> Can you prove that unlikely() makes a measured difference here?  If not,
> please remove it.  If you can, please provide the proof in the changelog
> when you resend it.

Unfortunately, I can't prove this so will remove in v2.

Thanks,
Jubin John
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 5/9] staging/rdma/hfi1: Add function stubs for TID caching

2015-11-09 Thread ira.weiny
On Fri, Nov 06, 2015 at 05:02:35PM -0800, Greg KH wrote:
> On Fri, Oct 30, 2015 at 06:58:44PM -0400, ira.we...@intel.com wrote:
> > From: Mitko Haralanov 
> > 
> > Add mmu notify helper functions and TID caching function stubs in 
> > preparation
> > for the TID caching implementation.
> > 
> > TID caching makes use of the MMU notifier to allow the driver to respond to 
> > the
> > user freeing memory which is allocated to the HFI.
> > 
> > This patch implements the basic MMU notifier functions to insert, find and
> > remove buffer pages from memory based on the mmu_notifier being invoked.
> > 
> > In addition it places stubs in place for the main entry points by follow on
> > code.
> > 
> > Follow up patches will complete the implementation of the interaction with 
> > user
> > space and makes use of these functions.
> 
> This patch adds warnings to the build, which isn't ok, so I've stopped
> here in this series and just applied the first 4.  Please fix up and
> resend the remaining ones.

Ok, patch 7-9 of this series do not depend on this patch nor number 6.  I will
resend those 3 while I figure out what to do about these 2:

staging/rdma/hfi1: Add function stubs for TID caching
staging/rdma/hfi1: Implement Expected Receive TID caching

Frankly this was an attempt to reduce the size of "Implement Expected Receive
TID caching".  I obviously did something wrong.

I really don't know that I can split these up any more without causing issues
with bisecting the code.

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/2] staging/rdma/hfi1: Workaround to prevent corruption during packet delivery

2015-11-09 Thread Jubin John
From: Mark F. Brown 

Disabling one receive context when RX_DMA is receiving a packet can cause
incorrect packet delivery for a subsequent packet on another receive
context.

This is resolved by doing the following:
1. Programming dummy tail address for every receive context
   before enabling it
2. While deallocating receive context resetting tail address
   to dummy address
3. Leaving the dummy address in when disabling tail update
4. When disabling receive context leaving tail update enabled

Reviewed-by: Dennis Dalessandro 
Reviewed-by: Mike Marciniszyn 
Reviewed-by: Mitko Haralanov 
Signed-off-by: Mark F. Brown 
Signed-off-by: Jubin John 
---
 drivers/staging/rdma/hfi1/chip.c |   18 +++---
 drivers/staging/rdma/hfi1/hfi.h  |4 
 drivers/staging/rdma/hfi1/init.c |   28 
 3 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/chip.c b/drivers/staging/rdma/hfi1/chip.c
index e489819..3c3ba52 100644
--- a/drivers/staging/rdma/hfi1/chip.c
+++ b/drivers/staging/rdma/hfi1/chip.c
@@ -7753,6 +7753,17 @@ void hfi1_rcvctrl(struct hfi1_devdata *dd, unsigned int 
op, int ctxt)
}
if (op & HFI1_RCVCTRL_CTXT_DIS) {
write_csr(dd, RCV_VL15, 0);
+   /*
+* When receive context is being disabled turn on tail
+* update with a dummy tail address and then disable
+* receive context.
+*/
+   if (dd->rcvhdrtail_dummy_physaddr) {
+   write_kctxt_csr(dd, ctxt, RCV_HDR_TAIL_ADDR,
+   dd->rcvhdrtail_dummy_physaddr);
+   rcvctrl |= RCV_CTXT_CTRL_TAIL_UPD_SMASK;
+   }
+
rcvctrl &= ~RCV_CTXT_CTRL_ENABLE_SMASK;
}
if (op & HFI1_RCVCTRL_INTRAVAIL_ENB)
@@ -7822,10 +7833,11 @@ void hfi1_rcvctrl(struct hfi1_devdata *dd, unsigned int 
op, int ctxt)
if (op & (HFI1_RCVCTRL_TAILUPD_DIS | HFI1_RCVCTRL_CTXT_DIS))
/*
 * If the context has been disabled and the Tail Update has
-* been cleared, clear the RCV_HDR_TAIL_ADDR CSR so
-* it doesn't contain an address that is invalid.
+* been cleared, set the RCV_HDR_TAIL_ADDR CSR to dummy address
+* so it doesn't contain an address that is invalid.
 */
-   write_kctxt_csr(dd, ctxt, RCV_HDR_TAIL_ADDR, 0);
+   write_kctxt_csr(dd, ctxt, RCV_HDR_TAIL_ADDR,
+   dd->rcvhdrtail_dummy_physaddr);
 }
 
 u32 hfi1_read_cntrs(struct hfi1_devdata *dd, loff_t pos, char **namep,
diff --git a/drivers/staging/rdma/hfi1/hfi.h b/drivers/staging/rdma/hfi1/hfi.h
index 190f7a2..10b227c 100644
--- a/drivers/staging/rdma/hfi1/hfi.h
+++ b/drivers/staging/rdma/hfi1/hfi.h
@@ -1084,6 +1084,10 @@ struct hfi1_devdata {
/* Save the enabled LCB error bits */
u64 lcb_err_en;
u8 dc_shutdown;
+
+   /* receive context tail dummy address */
+   __le64 *rcvhdrtail_dummy_kvaddr;
+   dma_addr_t rcvhdrtail_dummy_physaddr;
 };
 
 /* 8051 firmware version helper */
diff --git a/drivers/staging/rdma/hfi1/init.c b/drivers/staging/rdma/hfi1/init.c
index 8666f3a..3879d80 100644
--- a/drivers/staging/rdma/hfi1/init.c
+++ b/drivers/staging/rdma/hfi1/init.c
@@ -691,6 +691,18 @@ int hfi1_init(struct hfi1_devdata *dd, int reinit)
if (ret)
goto done;
 
+   /* allocate dummy tail memory for all receive contexts */
+   dd->rcvhdrtail_dummy_kvaddr = dma_zalloc_coherent(
+   >pcidev->dev, sizeof(u64),
+   >rcvhdrtail_dummy_physaddr,
+   GFP_KERNEL);
+
+   if (!dd->rcvhdrtail_dummy_kvaddr) {
+   dd_dev_err(dd, "cannot allocate dummy tail memory\n");
+   ret = -ENOMEM;
+   goto done;
+   }
+
/* dd->rcd can be NULL if early initialization failed */
for (i = 0; dd->rcd && i < dd->first_user_ctxt; ++i) {
/*
@@ -1266,6 +1278,14 @@ static void cleanup_device_data(struct hfi1_devdata *dd)
tmp = dd->rcd;
dd->rcd = NULL;
spin_unlock_irqrestore(>uctxt_lock, flags);
+
+   if (dd->rcvhdrtail_dummy_kvaddr) {
+   dma_free_coherent(>pcidev->dev, sizeof(u64),
+ (void *)dd->rcvhdrtail_dummy_kvaddr,
+ dd->rcvhdrtail_dummy_physaddr);
+ dd->rcvhdrtail_dummy_kvaddr = NULL;
+   }
+
for (ctxt = 0; tmp && ctxt < dd->num_rcv_contexts; ctxt++) {
struct hfi1_ctxtdata *rcd = tmp[ctxt];
 
@@ -1521,6 +1541,14 @@ int hfi1_create_rcvhdrq(struct hfi1_devdata *dd, struct 
hfi1_ctxtdata *rcd)
reg = (dd->rcvhdrsize 

[PATCH v2 0/2] staging/rdma/hfi1: Miscellaneous Fixes

2015-11-09 Thread Jubin John
Updated these 2 patches based on feedback received.

Changes in v2:
- Removed volatile for receive context tail dummy address
- Removed unlikely() in sdma_select_engine_vl()

Ira Weiny (1):
  staging/rdma/hfi1: Eliminate WARN_ON when VL is invalid

Mark F. Brown (1):
  staging/rdma/hfi1: Workaround to prevent corruption during packet
delivery

 drivers/staging/rdma/hfi1/chip.c |   18 +++---
 drivers/staging/rdma/hfi1/hfi.h  |4 
 drivers/staging/rdma/hfi1/init.c |   28 
 drivers/staging/rdma/hfi1/sdma.c |   12 ++--
 4 files changed, 57 insertions(+), 5 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/2] staging/rdma/hfi1: Eliminate WARN_ON when VL is invalid

2015-11-09 Thread Jubin John
From: Ira Weiny 

sdma_select_engine_vl only needs to protect itself from an invalid VL.
Something higher up the stack should be warning the user when they try
to use an SL which maps to an invalid VL.

Reviewed-by: Dean Luick 
Reviewed-by: Mike Marciniszyn 
Reviewed-by: Kaike Wan 
Signed-off-by: Ira Weiny 
Signed-off-by: Jubin John 
---
 drivers/staging/rdma/hfi1/sdma.c |   12 ++--
 1 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/sdma.c b/drivers/staging/rdma/hfi1/sdma.c
index 2a1da21..d7982a7 100644
--- a/drivers/staging/rdma/hfi1/sdma.c
+++ b/drivers/staging/rdma/hfi1/sdma.c
@@ -777,8 +777,14 @@ struct sdma_engine *sdma_select_engine_vl(
struct sdma_map_elem *e;
struct sdma_engine *rval;
 
-   if (WARN_ON(vl > 8))
-   return NULL;
+   /* NOTE This should only happen if SC->VL changed after the initial
+*  checks on the QP/AH
+*  Default will return engine 0 below
+*/
+   if (vl >= num_vls) {
+   rval = NULL;
+   goto done;
+   }
 
rcu_read_lock();
m = rcu_dereference(dd->sdma_map);
@@ -790,6 +796,8 @@ struct sdma_engine *sdma_select_engine_vl(
rval = e->sde[selector & e->mask];
rcu_read_unlock();
 
+done:
+   rval =  !rval ? >per_sdma[0] : rval;
trace_hfi1_sdma_engine_select(dd, selector, vl, rval->this_idx);
return rval;
 }
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 5/9] staging/rdma/hfi1: Add function stubs for TID caching

2015-11-09 Thread Greg KH
On Mon, Nov 09, 2015 at 07:11:03PM -0500, ira.weiny wrote:
> On Fri, Nov 06, 2015 at 05:02:35PM -0800, Greg KH wrote:
> > On Fri, Oct 30, 2015 at 06:58:44PM -0400, ira.we...@intel.com wrote:
> > > From: Mitko Haralanov 
> > > 
> > > Add mmu notify helper functions and TID caching function stubs in 
> > > preparation
> > > for the TID caching implementation.
> > > 
> > > TID caching makes use of the MMU notifier to allow the driver to respond 
> > > to the
> > > user freeing memory which is allocated to the HFI.
> > > 
> > > This patch implements the basic MMU notifier functions to insert, find and
> > > remove buffer pages from memory based on the mmu_notifier being invoked.
> > > 
> > > In addition it places stubs in place for the main entry points by follow 
> > > on
> > > code.
> > > 
> > > Follow up patches will complete the implementation of the interaction 
> > > with user
> > > space and makes use of these functions.
> > 
> > This patch adds warnings to the build, which isn't ok, so I've stopped
> > here in this series and just applied the first 4.  Please fix up and
> > resend the remaining ones.
> 
> Ok, patch 7-9 of this series do not depend on this patch nor number 6.  I will
> resend those 3 while I figure out what to do about these 2:
> 
> staging/rdma/hfi1: Add function stubs for TID caching
> staging/rdma/hfi1: Implement Expected Receive TID caching
> 
> Frankly this was an attempt to reduce the size of "Implement Expected Receive
> TID caching".  I obviously did something wrong.
> 
> I really don't know that I can split these up any more without causing issues
> with bisecting the code.

I strongly doubt that you created this new feature all "at once", it
took a set of steps to get to your final destination.  So show that
work, like your math professor told you...

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH ib-next 2/3] IB/core: IB/core: Allow legacy verbs through extended interfaces

2015-11-09 Thread Jason Gunthorpe
On Mon, Nov 09, 2015 at 08:48:36AM +0200, Haggai Eran wrote:
> On 08/11/2015 17:04, Matan Barak wrote:
> >> @@ -704,6 +719,10 @@ static ssize_t ib_uverbs_write(struct file *filp, 
> >> const char __user *buf,
> >> > }
> >> >
> >> > command = hdr.command & IB_USER_VERBS_CMD_COMMAND_MASK;
> >> > +   if (verify_command_mask(ib_dev, command)) {
> >> > +   ret = -EINVAL;
> > Why did you replace ENOSYS with EINVAL?
> 
> I think ENOSYS is meant only to represent a system call number that
> isn't supported. There was a change in checkpatch that now warns about
> it [1]. I'm not sure the intention was to fix existing uses though.

Within verbs we should have two kinds of return - not supported
request, and supported request with invalid parameters.

Maybe use EOPNOTSUPP ?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH ib-next 2/3] IB/core: IB/core: Allow legacy verbs through extended interfaces

2015-11-09 Thread Eli Cohen
On Mon, Nov 09, 2015 at 03:35:31PM -0700, Jason Gunthorpe wrote:
> On Mon, Nov 09, 2015 at 08:48:36AM +0200, Haggai Eran wrote:
> > On 08/11/2015 17:04, Matan Barak wrote:
> > >> @@ -704,6 +719,10 @@ static ssize_t ib_uverbs_write(struct file *filp, 
> > >> const char __user *buf,
> > >> > }
> > >> >
> > >> > command = hdr.command & IB_USER_VERBS_CMD_COMMAND_MASK;
> > >> > +   if (verify_command_mask(ib_dev, command)) {
> > >> > +   ret = -EINVAL;
> > > Why did you replace ENOSYS with EINVAL?
> > 
> > I think ENOSYS is meant only to represent a system call number that
> > isn't supported. There was a change in checkpatch that now warns about
> > it [1]. I'm not sure the intention was to fix existing uses though.
> 
> Within verbs we should have two kinds of return - not supported
> request, and supported request with invalid parameters.
> 
> Maybe use EOPNOTSUPP ?
> 

What about Matan's concern regarding legacy code? Maybe we should
leave ENOSYS as that's how it is all over the IB stack code.

quote:
I think it could break old applications:
err = extended_verb(the_first_extension_we_added);
if (err == ENOSYS)
  err = legacy_verb();
if (err)
return err;

Such applications used the first extension (that was added during the
addition of the extended verb) and when they realized it's not
supported, they dropped to the legacy verb. This change can now cause
the return of -EINVAL an early termination with an error.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH ib-next 2/3] IB/core: IB/core: Allow legacy verbs through extended interfaces

2015-11-09 Thread Jason Gunthorpe
On Tue, Nov 10, 2015 at 01:05:31AM +0200, Eli Cohen wrote:
> On Mon, Nov 09, 2015 at 03:35:31PM -0700, Jason Gunthorpe wrote:
> > On Mon, Nov 09, 2015 at 08:48:36AM +0200, Haggai Eran wrote:
> > > On 08/11/2015 17:04, Matan Barak wrote:
> > > >> @@ -704,6 +719,10 @@ static ssize_t ib_uverbs_write(struct file *filp, 
> > > >> const char __user *buf,
> > > >> > }
> > > >> >
> > > >> > command = hdr.command & IB_USER_VERBS_CMD_COMMAND_MASK;
> > > >> > +   if (verify_command_mask(ib_dev, command)) {
> > > >> > +   ret = -EINVAL;
> > > > Why did you replace ENOSYS with EINVAL?
> > > 
> > > I think ENOSYS is meant only to represent a system call number that
> > > isn't supported. There was a change in checkpatch that now warns about
> > > it [1]. I'm not sure the intention was to fix existing uses though.
> > 
> > Within verbs we should have two kinds of return - not supported
> > request, and supported request with invalid parameters.
> > 
> > Maybe use EOPNOTSUPP ?
> > 
> 
> What about Matan's concern regarding legacy code? Maybe we should
> leave ENOSYS as that's how it is all over the IB stack code.
> 
> quote:
> I think it could break old applications:
> err = extended_verb(the_first_extension_we_added);
> if (err == ENOSYS)
>   err = legacy_verb();
> if (err)
> return err;
> 
> Such applications used the first extension (that was added during the
> addition of the extended verb) and when they realized it's not
> supported, they dropped to the legacy verb. This change can now cause
> the return of -EINVAL an early termination with an error.

Since the change is to make the kernel do the above fall back
internally, this specific example doesn't make alot of sense to worry
about. Ie the extended verb won't fail anymore, and if it does the
legacy one won't work anyhow.

But if there is something out there that does care about ENOSYS we
should try to keep it, but don't convert ENOSYS to EINVAL.

Also, when the driver tests the ex flags for support it should be
returning EOPNOTSUPP or such not EINVAL.. Return codes for the ex
stuff could stand a good sanity audit.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html