Re: [PATCH 13/15] IB/srpt: Detect session shutdown reliably

2016-01-06 Thread Christoph Hellwig
On Wed, Jan 06, 2016 at 03:46:34PM +0100, Bart Van Assche wrote:
> I will make the patch description more detailed. Sorry if some of this code 
> is hard to follow but that's because of the high level of concurrency in 
> the SRP target driver. Some time ago I documented how session management in 
> the SCST ib_srpt driver works. This driver follows the same model. These 
> notes can be found here: 
> http://sourceforge.net/p/scst/svn/HEAD/tree/trunk/srpt/session-management.txt.

It might be useful to eventually add a version of that to the Linux
kernel tree as well.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/15] IB/srpt: Introduce target_reverse_dma_direction()

2016-01-05 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig <h...@lst.de>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/15] IB/srpt: Use scsilun_to_int()

2016-01-05 Thread Christoph Hellwig
On Tue, Jan 05, 2016 at 03:22:46PM +0100, Bart Van Assche wrote:
> Just like other target drivers, use scsilun_to_int() to unpack SCSI
> LUN numbers. This patch only changes the behavior of ib_srpt for LUN
> numbers >= 16384.
> 
> Signed-off-by: Bart Van Assche <bart.vanass...@sandisk.com>
> Cc: Christoph Hellwig <h...@lst.de>

Looks good,

Reviewed-by: Christoph Hellwig <h...@lst.de>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 06/15] IB/srpt: Simplify srpt_handle_tsk_mgmt()

2016-01-05 Thread Christoph Hellwig
On Tue, Jan 05, 2016 at 03:23:14PM +0100, Bart Van Assche wrote:
> Let the target core check task existence instead of the SRP target
> driver.

Looks good,

Reviewed-by: Christoph Hellwig <h...@lst.de>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/15] IB/srpt: Add parentheses around sizeof argument

2016-01-05 Thread Christoph Hellwig
On Tue, Jan 05, 2016 at 03:20:50PM +0100, Bart Van Assche wrote:
> Although sizeof is an operator and hence in many cases parentheses can
> be left out, the recommended kernel coding style is to surround the
> sizeof argument with parentheses. This patch does not change any
> functionality. This patch has been generated by running the following
> shell command:

I don't really care about this formatting, but the patch looks fine:

Reviewed-by: Christoph Hellwig <h...@lst.de>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/15] IB/srpt: Inline srpt_get_ch_state()

2016-01-05 Thread Christoph Hellwig
On Tue, Jan 05, 2016 at 03:21:53PM +0100, Bart Van Assche wrote:
> The callers of srpt_get_ch_state() can access ch->state safely without
> using locking. Hence inline this function.

Looks good,

Reviewed-by: Christoph Hellwig <h...@lst.de>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 11/15] IB/srpt: Fix how aborted commands are processed

2016-01-05 Thread Christoph Hellwig
>   pr_debug("Aborting cmd with state %d and tag %lld\n", state,
>ioctx->cmd.tag);
>  
> @@ -1299,14 +1291,16 @@ static int srpt_abort_cmd(struct srpt_send_ioctx 
> *ioctx)
>   case SRPT_STATE_NEW:
>   case SRPT_STATE_DATA_IN:
>   case SRPT_STATE_MGMT:
> + case SRPT_STATE_DONE:
>   /*
>* Do nothing - defer abort processing until
>* srpt_queue_response() is invoked.
>*/
> - WARN_ON(!(ioctx->cmd.transport_state & CMD_T_ABORTED));

Seems like this depends on your target core changes?  Maybe it would be better
to respin the series to got just on top of Doug's RDMA tree, as I think
we're more likely to get this series merged for 4.5 than the target core
changes..

Otherwise these changes look fine to me.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/15] IB/srpt: Fix srpt_close_session()

2016-01-05 Thread Christoph Hellwig
On Tue, Jan 05, 2016 at 03:24:49PM +0100, Bart Van Assche wrote:
> Avoid that srpt_close_session() waits if it doesn't have to wait.
> Additionally, increase the time during which srpt_close_session()
> waits until closing a session has finished. This makes it easier
> to detect session shutdown bugs.

Looks good,

Reviewed-by: Christoph Hellwig <h...@lst.de>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/15] IB/srpt: Simplify srpt_shutdown_session()

2016-01-05 Thread Christoph Hellwig
On Tue, Jan 05, 2016 at 03:24:15PM +0100, Bart Van Assche wrote:
> The target core guarantees that shutdown_session() is only invoked
> once per session. This means that the ib_srpt target driver doesn't
> have to track whether or not shutdown_session() has been called.
> Additionally, ensure that target_sess_cmd_list_set_waiting() is
> called before target_wait_for_sess_cmds() by moving it into
> srpt_release_channel_work().

Looks good,

Reviewed-by: Christoph Hellwig <h...@lst.de>

Mote that now most drivers return either always return 0 or 1 from
shutdown_session, so it might be worth to investigate if we can get
rid of this method in the future.

Minor nitpick below:

>  static int srpt_shutdown_session(struct se_session *se_sess)
>  {
>   return true;
>  }


Given that the function returns in this really should be 1 in instead of
true, but it's not really worth respinning the patch just for this.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/15] IB/srpt: Fix srpt_handle_cmd() error paths

2016-01-05 Thread Christoph Hellwig
On Tue, Jan 05, 2016 at 03:25:13PM +0100, Bart Van Assche wrote:
> The target core function that should be called if target_submit_cmd()
> fails is target_put_sess_cmd(). Additionally, change the return type
> of srpt_handle_cmd() from int into void.

I actually ran into this bug a long time ago with a modified srpt driver
and forgot to send a similar fix..

Looks good:

Reviewed-by: Christoph Hellwig <h...@lst.de>

Minor nitpick below:

> + send_ioctx->state = SRPT_STATE_DONE;
> + target_put_sess_cmd(cmd);
> + return;
>  }

no need for that return statement.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/15] IB/srpt: Simplify channel state management

2016-01-05 Thread Christoph Hellwig
On Tue, Jan 05, 2016 at 03:23:45PM +0100, Bart Van Assche wrote:
> The only allowed channel state changes are those that change
> the channel state into a state with a higher numerical value.
> This allows to merge the functions srpt_set_ch_state() and
> srpt_test_and_set_ch_state() into a single function.

It would be great having a little comment in srpt_set_ch_state explaining
why only changing to the numerical greater state is fine.

Otherwise looks good:

Reviewed-by: Christoph Hellwig <h...@lst.de>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 12/15] IB/srpt: Eliminate srpt_find_channel()

2016-01-05 Thread Christoph Hellwig
On Tue, Jan 05, 2016 at 03:26:22PM +0100, Bart Van Assche wrote:
> In the CM REQ message handler, store the channel pointer in
> cm_id->context such that the function srpt_find_channel() is no
> longer needed. Additionally, make the CM event messages more
> informative.

Looks fine,

Reviewed-by: Christoph Hellwig <h...@lst.de>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/11] IB: only keep a single key in struct ib_mr

2016-01-05 Thread Christoph Hellwig
On Tue, Jan 05, 2016 at 10:46:36AM -0700, Jason Gunthorpe wrote:
> > ULPs are *already* using the same registrations for both local and
> > remote access.
> 
> Where? Out of tree?

I haven't found anything in-tree for sure.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2 v2] IB/mad: use CQ abstraction

2016-01-05 Thread Christoph Hellwig
Remove the local workqueue to process mad completions and use the CQ API
instead.

Signed-off-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Hal Rosenstock <h...@mellanox.com>
Reviewed-by: Ira Weiny <ira.we...@intel.com>
---
 drivers/infiniband/core/mad.c  | 162 +
 drivers/infiniband/core/mad_priv.h |   2 +-
 2 files changed, 59 insertions(+), 105 deletions(-)

diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index cbe232a..9fa5bf3 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -61,18 +61,6 @@ MODULE_PARM_DESC(send_queue_size, "Size of send queue in 
number of work requests
 module_param_named(recv_queue_size, mad_recvq_size, int, 0444);
 MODULE_PARM_DESC(recv_queue_size, "Size of receive queue in number of work 
requests");
 
-/*
- * Define a limit on the number of completions which will be processed by the
- * worker thread in a single work item.  This ensures that other work items
- * (potentially from other users) are processed fairly.
- *
- * The number of completions was derived from the default queue sizes above.
- * We use a value which is double the larger of the 2 queues (receive @ 512)
- * but keep it fixed such that an increase in that value does not introduce
- * unfairness.
- */
-#define MAD_COMPLETION_PROC_LIMIT 1024
-
 static struct list_head ib_mad_port_list;
 static u32 ib_mad_client_id = 0;
 
@@ -96,6 +84,9 @@ static int add_nonoui_reg_req(struct ib_mad_reg_req 
*mad_reg_req,
  u8 mgmt_class);
 static int add_oui_reg_req(struct ib_mad_reg_req *mad_reg_req,
   struct ib_mad_agent_private *agent_priv);
+static bool ib_mad_send_error(struct ib_mad_port_private *port_priv,
+ struct ib_wc *wc);
+static void ib_mad_send_done(struct ib_cq *cq, struct ib_wc *wc);
 
 /*
  * Returns a ib_mad_port_private structure or NULL for a device/port
@@ -701,12 +692,11 @@ static void snoop_recv(struct ib_mad_qp_info *qp_info,
spin_unlock_irqrestore(_info->snoop_lock, flags);
 }
 
-static void build_smp_wc(struct ib_qp *qp,
-u64 wr_id, u16 slid, u16 pkey_index, u8 port_num,
-struct ib_wc *wc)
+static void build_smp_wc(struct ib_qp *qp, struct ib_cqe *cqe, u16 slid,
+   u16 pkey_index, u8 port_num, struct ib_wc *wc)
 {
memset(wc, 0, sizeof *wc);
-   wc->wr_id = wr_id;
+   wc->wr_cqe = cqe;
wc->status = IB_WC_SUCCESS;
wc->opcode = IB_WC_RECV;
wc->pkey_index = pkey_index;
@@ -844,7 +834,7 @@ static int handle_outgoing_dr_smp(struct 
ib_mad_agent_private *mad_agent_priv,
}
 
build_smp_wc(mad_agent_priv->agent.qp,
-send_wr->wr.wr_id, drslid,
+send_wr->wr.wr_cqe, drslid,
 send_wr->pkey_index,
 send_wr->port_num, _wc);
 
@@ -1051,7 +1041,9 @@ struct ib_mad_send_buf * ib_create_send_mad(struct 
ib_mad_agent *mad_agent,
 
mad_send_wr->sg_list[1].lkey = mad_agent->qp->pd->local_dma_lkey;
 
-   mad_send_wr->send_wr.wr.wr_id = (unsigned long) mad_send_wr;
+   mad_send_wr->mad_list.cqe.done = ib_mad_send_done;
+
+   mad_send_wr->send_wr.wr.wr_cqe = _send_wr->mad_list.cqe;
mad_send_wr->send_wr.wr.sg_list = mad_send_wr->sg_list;
mad_send_wr->send_wr.wr.num_sge = 2;
mad_send_wr->send_wr.wr.opcode = IB_WR_SEND;
@@ -1163,8 +1155,9 @@ int ib_send_mad(struct ib_mad_send_wr_private 
*mad_send_wr)
 
/* Set WR ID to find mad_send_wr upon completion */
qp_info = mad_send_wr->mad_agent_priv->qp_info;
-   mad_send_wr->send_wr.wr.wr_id = (unsigned long)_send_wr->mad_list;
mad_send_wr->mad_list.mad_queue = _info->send_queue;
+   mad_send_wr->mad_list.cqe.done = ib_mad_send_done;
+   mad_send_wr->send_wr.wr.wr_cqe = _send_wr->mad_list.cqe;
 
mad_agent = mad_send_wr->send_buf.mad_agent;
sge = mad_send_wr->sg_list;
@@ -2185,13 +2178,14 @@ handle_smi(struct ib_mad_port_private *port_priv,
return handle_ib_smi(port_priv, qp_info, wc, port_num, recv, response);
 }
 
-static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv,
-struct ib_wc *wc)
+static void ib_mad_recv_done(struct ib_cq *cq, struct ib_wc *wc)
 {
+   struct ib_mad_port_private *port_priv = cq->cq_context;
+   struct ib_mad_list_head *mad_list =
+   container_of(wc->wr_cqe, struct ib_mad_list_head, cqe);
struct ib_mad_qp_info *qp_info;
struct ib_mad_private_header *mad_priv_hdr;
struct ib_mad_private *recv, *response = NULL;
-   struct ib_mad_list_head *mad_list;
struct ib_mad_agent_private *mad_agent;
int port_num;
int

Re: [PATCH 2/2] IB/mad: use CQ abstraction

2016-01-05 Thread Christoph Hellwig
On Mon, Jan 04, 2016 at 07:04:03PM -0500, ira.weiny wrote:
> Sorry I did not catch this before but rather than void * wouldn't it be better
> to use struct ib_cqe?

Sure, I'll fix it up.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/15] IB/srpt: Fix srpt_write_pending()

2016-01-05 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig <h...@lst.de>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 15/15] IB/srpt: Fix a rare crash in srpt_close_session()

2016-01-05 Thread Christoph Hellwig
>   srpt_disconnect_ch(ch);
>  
> + kref_put(>kref, srpt_free_ch);

At some point it might be a good idea to have a srpt_put_ch helper to wrap
this pattern.

Otherwise looks good:

Reviewed-by: Christoph Hellwig <h...@lst.de>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/15] IB/srpt: Detect session shutdown reliably

2016-01-05 Thread Christoph Hellwig
On Tue, Jan 05, 2016 at 03:26:49PM +0100, Bart Van Assche wrote:
> The Last WQE Reached event is only generated after one or more work
> requests have been queued on the QP associated with a session. Since
> session shutdown can start before any work requests have been queued,
> use a zero-length RDMA write to wait until a QP has been drained.

We actually ran into the same issue with a SRPT-derived work in progress
driver recently..

> @@ -2314,14 +2346,13 @@ static void srpt_cm_timewait_exit(struct srpt_rdma_ch 
> *ch)
>  {
>   pr_info("Received CM TimeWait exit for ch %s-%d.\n", ch->sess_name,
>   ch->qp->qp_num);
> + srpt_close_ch(ch);
>  }
>  
>  static void srpt_cm_rep_error(struct srpt_rdma_ch *ch)
>  {
>   pr_info("Received CM REP error for ch %s-%d.\n", ch->sess_name,
>   ch->qp->qp_num);
>  }
>  
>  /**
> @@ -2329,33 +2360,7 @@ static void srpt_cm_rep_error(struct srpt_rdma_ch *ch)
>   */
>  static void srpt_cm_dreq_recv(struct srpt_rdma_ch *ch)
>  {
> + srpt_disconnect_ch(ch);
>  }
>  
>  /**
> @@ -2364,7 +2369,7 @@ static void srpt_cm_dreq_recv(struct srpt_rdma_ch *ch)
>  static void srpt_cm_drep_recv(struct srpt_rdma_ch *ch)
>  {
>   pr_info("Received InfiniBand DREP message for cm_id %p.\n", ch->cm_id);
> +     srpt_close_ch(ch);
>  }


Is there any good reson to keep these one-liner helpers around?

Otherwise looks good,

Reviewed-by: Christoph Hellwig <h...@lst.de>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] IB/mad: use CQ abstraction

2016-01-04 Thread Christoph Hellwig
Remove the local workqueue to process mad completions and use the CQ API
instead.

Signed-off-by: Christoph Hellwig <h...@lst.de>
---
 drivers/infiniband/core/mad.c  | 159 +
 drivers/infiniband/core/mad_priv.h |   2 +-
 2 files changed, 58 insertions(+), 103 deletions(-)

diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index cbe232a..286d1a9 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -61,18 +61,6 @@ MODULE_PARM_DESC(send_queue_size, "Size of send queue in 
number of work requests
 module_param_named(recv_queue_size, mad_recvq_size, int, 0444);
 MODULE_PARM_DESC(recv_queue_size, "Size of receive queue in number of work 
requests");
 
-/*
- * Define a limit on the number of completions which will be processed by the
- * worker thread in a single work item.  This ensures that other work items
- * (potentially from other users) are processed fairly.
- *
- * The number of completions was derived from the default queue sizes above.
- * We use a value which is double the larger of the 2 queues (receive @ 512)
- * but keep it fixed such that an increase in that value does not introduce
- * unfairness.
- */
-#define MAD_COMPLETION_PROC_LIMIT 1024
-
 static struct list_head ib_mad_port_list;
 static u32 ib_mad_client_id = 0;
 
@@ -96,6 +84,9 @@ static int add_nonoui_reg_req(struct ib_mad_reg_req 
*mad_reg_req,
  u8 mgmt_class);
 static int add_oui_reg_req(struct ib_mad_reg_req *mad_reg_req,
   struct ib_mad_agent_private *agent_priv);
+static bool ib_mad_send_error(struct ib_mad_port_private *port_priv,
+ struct ib_wc *wc);
+static void ib_mad_send_done(struct ib_cq *cq, struct ib_wc *wc);
 
 /*
  * Returns a ib_mad_port_private structure or NULL for a device/port
@@ -702,11 +693,11 @@ static void snoop_recv(struct ib_mad_qp_info *qp_info,
 }
 
 static void build_smp_wc(struct ib_qp *qp,
-u64 wr_id, u16 slid, u16 pkey_index, u8 port_num,
+void *wr_cqe, u16 slid, u16 pkey_index, u8 port_num,
 struct ib_wc *wc)
 {
memset(wc, 0, sizeof *wc);
-   wc->wr_id = wr_id;
+   wc->wr_cqe = wr_cqe;
wc->status = IB_WC_SUCCESS;
wc->opcode = IB_WC_RECV;
wc->pkey_index = pkey_index;
@@ -844,7 +835,7 @@ static int handle_outgoing_dr_smp(struct 
ib_mad_agent_private *mad_agent_priv,
}
 
build_smp_wc(mad_agent_priv->agent.qp,
-send_wr->wr.wr_id, drslid,
+send_wr->wr.wr_cqe, drslid,
 send_wr->pkey_index,
 send_wr->port_num, _wc);
 
@@ -1051,7 +1042,9 @@ struct ib_mad_send_buf * ib_create_send_mad(struct 
ib_mad_agent *mad_agent,
 
mad_send_wr->sg_list[1].lkey = mad_agent->qp->pd->local_dma_lkey;
 
-   mad_send_wr->send_wr.wr.wr_id = (unsigned long) mad_send_wr;
+   mad_send_wr->mad_list.cqe.done = ib_mad_send_done;
+
+   mad_send_wr->send_wr.wr.wr_cqe = _send_wr->mad_list.cqe;
mad_send_wr->send_wr.wr.sg_list = mad_send_wr->sg_list;
mad_send_wr->send_wr.wr.num_sge = 2;
mad_send_wr->send_wr.wr.opcode = IB_WR_SEND;
@@ -1163,8 +1156,9 @@ int ib_send_mad(struct ib_mad_send_wr_private 
*mad_send_wr)
 
/* Set WR ID to find mad_send_wr upon completion */
qp_info = mad_send_wr->mad_agent_priv->qp_info;
-   mad_send_wr->send_wr.wr.wr_id = (unsigned long)_send_wr->mad_list;
mad_send_wr->mad_list.mad_queue = _info->send_queue;
+   mad_send_wr->mad_list.cqe.done = ib_mad_send_done;
+   mad_send_wr->send_wr.wr.wr_cqe = _send_wr->mad_list.cqe;
 
mad_agent = mad_send_wr->send_buf.mad_agent;
sge = mad_send_wr->sg_list;
@@ -2185,13 +2179,14 @@ handle_smi(struct ib_mad_port_private *port_priv,
return handle_ib_smi(port_priv, qp_info, wc, port_num, recv, response);
 }
 
-static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv,
-struct ib_wc *wc)
+static void ib_mad_recv_done(struct ib_cq *cq, struct ib_wc *wc)
 {
+   struct ib_mad_port_private *port_priv = cq->cq_context;
+   struct ib_mad_list_head *mad_list =
+   container_of(wc->wr_cqe, struct ib_mad_list_head, cqe);
struct ib_mad_qp_info *qp_info;
struct ib_mad_private_header *mad_priv_hdr;
struct ib_mad_private *recv, *response = NULL;
-   struct ib_mad_list_head *mad_list;
struct ib_mad_agent_private *mad_agent;
int port_num;
int ret = IB_MAD_RESULT_SUCCESS;
@@ -2199,7 +2194,17 @@ static void ib_mad_recv_done_handler(struct 
ib_mad_port_private *port_priv,
u16 resp_mad_pkey_index = 0;
bool opa;
 
-   mad_list = (struct ib_mad_list_head 

[PATCH 1/2] IB/mad: pass ib_mad_send_buf explicitly to the recv_handler

2016-01-04 Thread Christoph Hellwig
Stop abusing wr_id and just pass the parameter explicitly.

Signed-off-by: Christoph Hellwig <h...@lst.de>
---
 drivers/infiniband/core/cm.c  |  1 +
 drivers/infiniband/core/mad.c | 18 ++
 drivers/infiniband/core/sa_query.c|  7 ---
 drivers/infiniband/core/user_mad.c|  1 +
 drivers/infiniband/ulp/srpt/ib_srpt.c |  1 +
 include/rdma/ib_mad.h |  2 ++
 6 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index e3a95d1..ad3726d 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -3503,6 +3503,7 @@ int ib_cm_notify(struct ib_cm_id *cm_id, enum 
ib_event_type event)
 EXPORT_SYMBOL(ib_cm_notify);
 
 static void cm_recv_handler(struct ib_mad_agent *mad_agent,
+   struct ib_mad_send_buf *send_buf,
struct ib_mad_recv_wc *mad_recv_wc)
 {
struct cm_port *port = mad_agent->context;
diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index d4d2a61..cbe232a 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -693,7 +693,7 @@ static void snoop_recv(struct ib_mad_qp_info *qp_info,
 
atomic_inc(_snoop_priv->refcount);
spin_unlock_irqrestore(_info->snoop_lock, flags);
-   mad_snoop_priv->agent.recv_handler(_snoop_priv->agent,
+   mad_snoop_priv->agent.recv_handler(_snoop_priv->agent, NULL,
   mad_recv_wc);
deref_snoop_agent(mad_snoop_priv);
spin_lock_irqsave(_info->snoop_lock, flags);
@@ -1994,9 +1994,9 @@ static void ib_mad_complete_recv(struct 
ib_mad_agent_private *mad_agent_priv,
/* user rmpp is in effect
 * and this is an active RMPP MAD
 */
-   mad_recv_wc->wc->wr_id = 0;
-   
mad_agent_priv->agent.recv_handler(_agent_priv->agent,
-  mad_recv_wc);
+   mad_agent_priv->agent.recv_handler(
+   _agent_priv->agent, NULL,
+   mad_recv_wc);
atomic_dec(_agent_priv->refcount);
} else {
/* not user rmpp, revert to normal behavior and
@@ -2010,9 +2010,10 @@ static void ib_mad_complete_recv(struct 
ib_mad_agent_private *mad_agent_priv,
spin_unlock_irqrestore(_agent_priv->lock, flags);
 
/* Defined behavior is to complete response before 
request */
-   mad_recv_wc->wc->wr_id = (unsigned long) 
_send_wr->send_buf;
-   
mad_agent_priv->agent.recv_handler(_agent_priv->agent,
-  mad_recv_wc);
+   mad_agent_priv->agent.recv_handler(
+   _agent_priv->agent,
+   _send_wr->send_buf,
+   mad_recv_wc);
atomic_dec(_agent_priv->refcount);
 
mad_send_wc.status = IB_WC_SUCCESS;
@@ -2021,7 +2022,7 @@ static void ib_mad_complete_recv(struct 
ib_mad_agent_private *mad_agent_priv,
ib_mad_complete_send_wr(mad_send_wr, _send_wc);
}
} else {
-   mad_agent_priv->agent.recv_handler(_agent_priv->agent,
+   mad_agent_priv->agent.recv_handler(_agent_priv->agent, NULL,
   mad_recv_wc);
deref_mad_agent(mad_agent_priv);
}
@@ -2762,6 +2763,7 @@ static void local_completions(struct work_struct *work)
   IB_MAD_SNOOP_RECVS);
recv_mad_agent->agent.recv_handler(
_mad_agent->agent,
+   >mad_send_wr->send_buf,

>mad_priv->header.recv_wc);
spin_lock_irqsave(_mad_agent->lock, flags);
atomic_dec(_mad_agent->refcount);
diff --git a/drivers/infiniband/core/sa_query.c 
b/drivers/infiniband/core/sa_query.c
index e364a42..1f91b6e 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -1669,14 +1669,15 @@ static void send_handler(struct ib_mad_agent *agent,
 }
 
 static void recv_handler(struct ib_mad_agent *mad_agent,
+struct ib_mad_send_buf *send_buf,
 

Re: [PATCH 3/3] IB/srpt: Fix a race condition related to SRP login

2016-01-03 Thread Christoph Hellwig
On Thu, Dec 31, 2015 at 09:57:58AM +0100, Bart Van Assche wrote:
> Since patch "IB/srpt: chain RDMA READ/WRITE requests" there are
> two loops that process the command wait list. ch->cmd_wait_list
> is accessed without locking which means that all code that
> accesses this list must be serialized. Since processing of the
> RTU event happens from another context than IB WC processing,
> remove the wait list processing code from the RTU handler.

But now the first I/O(s) could be lost if no other I/O comes in,
right?  I suspect that we need to keep this loop to protect against
such corner cases.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/mad: Ensure fairness in ib_mad_completion_handler

2016-01-02 Thread Christoph Hellwig
On Wed, Dec 30, 2015 at 09:00:07PM -0500, ira.weiny wrote:
> On Wed, Dec 30, 2015 at 03:01:33AM -0800, Christoph Hellwig wrote:
> > Hi Ira,
> > 
> > please take a look at the patches I've attached - they are just WIP
> > that hasn't been tested as I'm on a vacation without access to my
> > IB setup until New Year's Eve.
> 
> I have them on a branch.
> 
> I'll try and do some testing over the weekend.

FYI, you probably need this fix from Bart:

http://marc.info/?l=linux-kernel=145138288102008=2
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/mad: Ensure fairness in ib_mad_completion_handler

2015-12-30 Thread Christoph Hellwig
Hi Ira,

please take a look at the patches I've attached - they are just WIP
that hasn't been tested as I'm on a vacation without access to my
IB setup until New Year's Eve.

Patch 1 is I think a genuine bug fix caused by the madness (pun
intendended) of the wr_id abuses.

Patch 2: passes the mad_send_buf explicitily to mad handlers to get rid
of that mess.

Patch 3 is the CQ API conversion which becomes relatively simple once
the prior issues above are sorted out.

>From a22609131ca353278015b6b4aec3077db06ad9f5 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <h...@lst.de>
Date: Wed, 30 Dec 2015 11:49:22 +0100
Subject: IB/mad: pass send buf in wr_id in local_completions

The sa_query recv_handler expects the send_buf in wr_id, and all other recv
handlers ignore wr_id.  It seems this is what we should pass, please confirm.

Signed-off-by: Christoph Hellwig <h...@lst.de>
---
 drivers/infiniband/core/mad.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index d4d2a61..e0859e5 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -2734,7 +2734,7 @@ static void local_completions(struct work_struct *work)
 * before request
 */
build_smp_wc(recv_mad_agent->agent.qp,
-(unsigned long) local->mad_send_wr,
+(unsigned long) 
>mad_send_wr->send_buf,
 be16_to_cpu(IB_LID_PERMISSIVE),
 local->mad_send_wr->send_wr.pkey_index,
 recv_mad_agent->agent.port_num, );
-- 
1.9.1

>From c6101bfa6543d0b35c2ca2fa0add09341f456e88 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <h...@lst.de>
Date: Wed, 30 Dec 2015 11:54:02 +0100
Subject: IB/mad: pass ib_mad_send_buf explicitly to the recv_handler

Stop abusing wr_id and just pass the parameter explicitly.

Signed-off-by: Christoph Hellwig <h...@lst.de>
---
 drivers/infiniband/core/cm.c  |  1 +
 drivers/infiniband/core/mad.c | 18 ++
 drivers/infiniband/core/sa_query.c|  7 ++-
 drivers/infiniband/core/user_mad.c|  1 +
 drivers/infiniband/ulp/srpt/ib_srpt.c |  1 +
 include/rdma/ib_mad.h |  2 ++
 6 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index e3a95d1..ad3726d 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -3503,6 +3503,7 @@ int ib_cm_notify(struct ib_cm_id *cm_id, enum 
ib_event_type event)
 EXPORT_SYMBOL(ib_cm_notify);
 
 static void cm_recv_handler(struct ib_mad_agent *mad_agent,
+   struct ib_mad_send_buf *send_buf,
struct ib_mad_recv_wc *mad_recv_wc)
 {
struct cm_port *port = mad_agent->context;
diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index e0859e5..f15fcd6 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -693,7 +693,7 @@ static void snoop_recv(struct ib_mad_qp_info *qp_info,
 
atomic_inc(_snoop_priv->refcount);
spin_unlock_irqrestore(_info->snoop_lock, flags);
-   mad_snoop_priv->agent.recv_handler(_snoop_priv->agent,
+   mad_snoop_priv->agent.recv_handler(_snoop_priv->agent, NULL,
   mad_recv_wc);
deref_snoop_agent(mad_snoop_priv);
spin_lock_irqsave(_info->snoop_lock, flags);
@@ -1994,9 +1994,9 @@ static void ib_mad_complete_recv(struct 
ib_mad_agent_private *mad_agent_priv,
/* user rmpp is in effect
 * and this is an active RMPP MAD
 */
-   mad_recv_wc->wc->wr_id = 0;
-   
mad_agent_priv->agent.recv_handler(_agent_priv->agent,
-  mad_recv_wc);
+   mad_agent_priv->agent.recv_handler(
+   _agent_priv->agent, NULL,
+   mad_recv_wc);
atomic_dec(_agent_priv->refcount);
} else {
/* not user rmpp, revert to normal behavior and
@@ -2010,9 +2010,10 @@ static void ib_mad_complete_recv(struct 
ib_mad_agent_private *mad_agent_priv,
spin_unlock_irqrestore(_agent_priv->lock, flags);
 
/* Defined behavior is to complete response before 
request */
-   mad_recv_wc->wc->wr_id = (unsigned 

Re: [PATCH 08/13] IB/srpt: chain RDMA READ/WRITE requests

2015-12-30 Thread Christoph Hellwig
On Tue, Dec 29, 2015 at 10:58:24AM +0100, Bart Van Assche wrote:
> On 12/07/2015 09:51 PM, Christoph Hellwig wrote:
> > Remove struct rdma_iu and instead allocate the struct ib_rdma_wr array
> > early and fill out directly.  This allows us to chain the WRs, and thus
> > archive both less lock contention on the HCA workqueue as well as much
> > simpler error handling.
> 
> Please consider folding the patch below into this patch.

Looks fine,

Reviewed-by: Christoph Hellwig <h...@lst.de>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/13] irq_poll: fold irq_poll_sched_prep into irq_poll_sched

2015-12-30 Thread Christoph Hellwig
On Tue, Dec 29, 2015 at 10:54:18AM +0100, Bart Van Assche wrote:
> After having applied these changes the SRP initiator didn't receive any 
> RDMA completions anymore. I could remedy that by changing 
> "!test_and_set_bit()" into "test_and_set_bit()":

Yes.  I actually had this bug earlier, fixed it and managed to get
it back during a rebase, d'oh.

Reviewed-by: Christoph Hellwig <h...@lst.de>

Can you resend it with a proper signoff?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/core: Remove a set-but-not-used variable from ib_sg_to_pages()

2015-12-30 Thread Christoph Hellwig
Looks fine,

Reviewed-by: Christoph Hellwig <h...@lst.de>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/6] IB/uapi: expose uverbs WC opcodes

2015-12-30 Thread Christoph Hellwig
On Tue, Dec 29, 2015 at 01:02:54PM +0200, Sagi Grimberg wrote:
>> As you did it in the first patch, just don't assign after IB_WC_LOCAL_INV.
>> Compiler will handle IB_UVERS_WC_SEND_END + X calculations by itself.
>
> I disagree, I'd say its better to keep the code verbosity level here...

I really don't like enum auto assignment for constants that are fixe
as part of an ABI.  There is too much chance of things going wrong.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/mad: Ensure fairness in ib_mad_completion_handler

2015-12-29 Thread Christoph Hellwig
Please just convert the mad handler to the new CQ API in
drivers/infiniband/core/cq.c.  If you have any question about it I'd be
glad to help you.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Generic InfiniBand transport done in software

2015-12-29 Thread Christoph Hellwig
Hi Moni,

On Sun, Dec 27, 2015 at 07:54:46PM +0200, Moni Shoua wrote:
> But you post *now* a so called generic driver so it must now fit any
> possible driver (including Soft RoCE)

it's never going to fit any possible future driver.  Dennis and folks
have done great work to move code outside the drivers into a shared
library.  So far it's been driven just by the Intel drivers as that's
the only thing they were interested in.

If you are interested in supporting SoftROCE please work with them
by adjusting the code towards your requirements.  In Linux we have
great results with iterative appoaches and I'd suggest you try it
as well.

> What kind of a feedback you expect when I don't have an idea about
> your plans for rdmavt
> Interfaces, flows, data structures... all is missing from the
> documentation to rdmavt.

You've got the code, so let's work based on that.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V1 0/3] Add cross-channel support

2015-12-24 Thread Christoph Hellwig
On Thu, Dec 24, 2015 at 10:02:29AM +0200, Or Gerlitz wrote:
> We had consensus among the reviewers that the 1st patch ("IB/core: Align
> coding style of ib_device_cap_flags structure") is wrong cleanup which
> basically is (1) unneeded (2) creates more damage (git blame and such,
> non-applicable to uapi, more) than benefit, etc -- finally Leon was
> convinced too [1].

It's not really an issue vs uapi.  Using the the wierd BIT() macro
would have been, but without it I think this cleanup is ok, even if I
personally wouldn't have done it.  git-blame isn't really a major
issue either, as you can blame past revisions.

> Leon will re-spin in the coming 1-2 hours V2, could please pick it instead
> of V1, when people agree on direction X and you are not against it, lets do
> X and not Y.

It would be great if we could stop rebasing whats already in the tree
for the benefit of everyone building on top of this.  For example just
finished rebasing my series to move many constants includin this one
to the uapi headers, and I'd hate to rebase it once again now that
the dust has settled.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/10] iSER support for remote invalidate

2015-12-24 Thread Christoph Hellwig
> Applied to target-pending/for-next as v4.5-rc1 material, along with
> Reviewed-by tags from HCH.

So this is both in your and Dougs now it seems.  Given the non-trivial
merge with the other RDMA updates I'd suggest to drop it from the
target tree as Doug already sorted out the merge.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH rdma-next V2 00/32] Soft-RoCE driver

2015-12-24 Thread Christoph Hellwig
On Thu, Dec 24, 2015 at 11:17:46AM +0200, Kamal Heib wrote:
> We've located the driver in the staging subtree. This follows a requirement
> to implement an IB transport library - Soft RoCE is in the same boat like the 
> hfi1
> driver. We need to define and implement a lib to prevent those code
> duplications.

Given the trainwreck that the staging process is it might seems more
sensible to get it into a stage and then merge it directly.  You'll
probably save yourself a lot of work that way.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH rdma-next V2 00/32] Soft-RoCE driver

2015-12-24 Thread Christoph Hellwig
On Thu, Dec 24, 2015 at 02:58:10PM +0200, Or Gerlitz wrote:
> On Thu, Dec 24, 2015 at 12:02 PM, Christoph Hellwig <h...@infradead.org> 
> wrote:
> > On Thu, Dec 24, 2015 at 11:17:46AM +0200, Kamal Heib wrote:
> >> We've located the driver in the staging subtree. This follows a requirement
> >> to implement an IB transport library - Soft RoCE is in the same boat like 
> >> the hfi1
> >> driver. We need to define and implement a lib to prevent those code
> >> duplications.
> >
> > Given the trainwreck that the staging process is it might seems more
> > sensible to get it into a stage and then merge it directly.  You'll
> > probably save yourself a lot of work that way.
> 
> I am not sure what you mean by "get it into a stage and then merge it
> directly" --i
> is that not go through staging at all?

Sorry, I should have not finished that email in a hurry before leaving
the house.  Let me rephrase:

Given the trainwreck that the staging process is it, might be more
sensible to get it into shape and then merge it directly.  You'll
probably save yourself a lot of work that way.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/6] IB/uapi: expose device capability flags

2015-12-24 Thread Christoph Hellwig
Expose the device capability flags which can be queried through uverbs in
the uapi headers.

Signed-off-by: Christoph Hellwig <h...@lst.de>
---
 include/rdma/ib_verbs.h  | 94 +++-
 include/uapi/rdma/ib_verbs.h | 66 +++
 2 files changed, 98 insertions(+), 62 deletions(-)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 48bfcf5..b8d4113 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -151,68 +151,38 @@ enum rdma_link_layer {
 };
 
 enum ib_device_cap_flags {
-   IB_DEVICE_RESIZE_MAX_WR = (1 << 0),
-   IB_DEVICE_BAD_PKEY_CNTR = (1 << 1),
-   IB_DEVICE_BAD_QKEY_CNTR = (1 << 2),
-   IB_DEVICE_RAW_MULTI = (1 << 3),
-   IB_DEVICE_AUTO_PATH_MIG = (1 << 4),
-   IB_DEVICE_CHANGE_PHY_PORT   = (1 << 5),
-   IB_DEVICE_UD_AV_PORT_ENFORCE= (1 << 6),
-   IB_DEVICE_CURR_QP_STATE_MOD = (1 << 7),
-   IB_DEVICE_SHUTDOWN_PORT = (1 << 8),
-   IB_DEVICE_INIT_TYPE = (1 << 9),
-   IB_DEVICE_PORT_ACTIVE_EVENT = (1 << 10),
-   IB_DEVICE_SYS_IMAGE_GUID= (1 << 11),
-   IB_DEVICE_RC_RNR_NAK_GEN= (1 << 12),
-   IB_DEVICE_SRQ_RESIZE= (1 << 13),
-   IB_DEVICE_N_NOTIFY_CQ   = (1 << 14),
-
-   /*
-* This device supports a per-device lkey or stag that can be
-* used without performing a memory registration for the local
-* memory.  Note that ULPs should never check this flag, but
-* instead of use the local_dma_lkey flag in the ib_pd structure,
-* which will always contain a usable lkey.
-*/
-   IB_DEVICE_LOCAL_DMA_LKEY= (1 << 15),
-   IB_DEVICE_RESERVED /* old SEND_W_INV */ = (1 << 16),
-   IB_DEVICE_MEM_WINDOW= (1 << 17),
-   /*
-* Devices should set IB_DEVICE_UD_IP_SUM if they support
-* insertion of UDP and TCP checksum on outgoing UD IPoIB
-* messages and can verify the validity of checksum for
-* incoming messages.  Setting this flag implies that the
-* IPoIB driver may set NETIF_F_IP_CSUM for datagram mode.
-*/
-   IB_DEVICE_UD_IP_CSUM= (1 << 18),
-   IB_DEVICE_UD_TSO= (1 << 19),
-   IB_DEVICE_XRC   = (1 << 20),
-
-   /*
-* This device supports the IB "base memory management extension",
-* which includes support for fast registrations (IB_WR_REG_MR,
-* IB_WR_LOCAL_INV and IB_WR_SEND_WITH_INV verbs).  This flag should
-* also be set by any iWarp device which must support FRs to comply
-* to the iWarp verbs spec.  iWarp devices also support the
-* IB_WR_RDMA_READ_WITH_INV verb for RDMA READs that invalidate the
-* stag.
-*/
-   IB_DEVICE_MEM_MGT_EXTENSIONS= (1 << 21),
-   IB_DEVICE_BLOCK_MULTICAST_LOOPBACK  = (1 << 22),
-   IB_DEVICE_MEM_WINDOW_TYPE_2A= (1 << 23),
-   IB_DEVICE_MEM_WINDOW_TYPE_2B= (1 << 24),
-   IB_DEVICE_RC_IP_CSUM= (1 << 25),
-   IB_DEVICE_RAW_IP_CSUM   = (1 << 26),
-   /*
-* Devices should set IB_DEVICE_CROSS_CHANNEL if they
-* support execution of WQEs that involve synchronization
-* of I/O operations with single completion queue managed
-* by hardware.
-*/
-   IB_DEVICE_CROSS_CHANNEL = (1 << 27),
-   IB_DEVICE_MANAGED_FLOW_STEERING = (1 << 29),
-   IB_DEVICE_SIGNATURE_HANDOVER= (1 << 30),
-   IB_DEVICE_ON_DEMAND_PAGING  = (1 << 31),
+   IB_DEVICE_RESIZE_MAX_WR = IB_UVERBS_DEVICE_RESIZE_MAX_WR,
+   IB_DEVICE_BAD_PKEY_CNTR = IB_UVERBS_DEVICE_BAD_PKEY_CNTR,
+   IB_DEVICE_BAD_QKEY_CNTR = IB_UVERBS_DEVICE_BAD_QKEY_CNTR,
+   IB_DEVICE_RAW_MULTI = IB_UVERBS_DEVICE_RAW_MULTI,
+   IB_DEVICE_AUTO_PATH_MIG = IB_UVERBS_DEVICE_AUTO_PATH_MIG,
+   IB_DEVICE_CHANGE_PHY_PORT   = IB_UVERBS_DEVICE_CHANGE_PHY_PORT,
+   IB_DEVICE_UD_AV_PORT_ENFORCE= IB_UVERBS_DEVICE_UD_AV_PORT_ENFORCE,
+   IB_DEVICE_CURR_QP_STATE_MOD = IB_UVERBS_DEVICE_UD_AV_PORT_ENFORCE,
+   IB_DEVICE_SHUTDOWN_PORT = IB_UVERBS_DEVICE_SHUTDOWN_PORT,
+   IB_DEVICE_INIT_TYPE = IB_UVERBS_DEVICE_INIT_TYPE,
+   IB_DEVICE_PORT_ACTIVE_EVENT = IB_UVERBS_DEVICE_PORT_ACTIVE_EVENT,
+   IB_DEVICE_SYS_IMAGE_GUID= IB_UVE

[PATCH 3/6] IB/uapi: expose uverbs WC opcodes

2015-12-24 Thread Christoph Hellwig
This exposes the WC opcodes supported by uverbs as part of the uapi
headers.  It follows the same scheme as the WR opcodes.

Signed-off-by: Christoph Hellwig <h...@lst.de>
---
 include/rdma/ib_verbs.h  | 29 +
 include/uapi/rdma/ib_verbs.h | 16 
 2 files changed, 29 insertions(+), 16 deletions(-)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 5dccc6a..7dce204 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -819,22 +819,19 @@ enum ib_wc_status {
 const char *__attribute_const__ ib_wc_status_msg(enum ib_wc_status status);
 
 enum ib_wc_opcode {
-   IB_WC_SEND,
-   IB_WC_RDMA_WRITE,
-   IB_WC_RDMA_READ,
-   IB_WC_COMP_SWAP,
-   IB_WC_FETCH_ADD,
-   IB_WC_LSO,
-   IB_WC_LOCAL_INV,
-   IB_WC_REG_MR,
-   IB_WC_MASKED_COMP_SWAP,
-   IB_WC_MASKED_FETCH_ADD,
-/*
- * Set value of IB_WC_RECV so consumers can test if a completion is a
- * receive by testing (opcode & IB_WC_RECV).
- */
-   IB_WC_RECV  = 1 << 7,
-   IB_WC_RECV_RDMA_WITH_IMM
+   IB_WC_SEND  = IB_UVERBS_WC_SEND,
+   IB_WC_RDMA_WRITE= IB_UVERBS_WC_RDMA_WRITE,
+   IB_WC_RDMA_READ = IB_UVERBS_WC_RDMA_READ,
+   IB_WC_COMP_SWAP = IB_UVERBS_WC_COMP_SWAP,
+   IB_WC_FETCH_ADD = IB_UVERBS_WC_FETCH_ADD,
+   IB_WC_LSO   = IB_UVERBS_WC_SEND_END,
+   IB_WC_LOCAL_INV = IB_UVERBS_WC_SEND_END + 1,
+   IB_WC_REG_MR= IB_UVERBS_WC_SEND_END + 2,
+   IB_WC_MASKED_COMP_SWAP  = IB_UVERBS_WC_SEND_END + 3,
+   IB_WC_MASKED_FETCH_ADD  = IB_UVERBS_WC_SEND_END + 4,
+
+   IB_WC_RECV  = IB_UVERBS_WC_RECV,
+   IB_WC_RECV_RDMA_WITH_IMM = IB_UVERBS_WC_RECV_END,
 };
 
 enum ib_wc_flags {
diff --git a/include/uapi/rdma/ib_verbs.h b/include/uapi/rdma/ib_verbs.h
index 3be3152..fd7a393 100644
--- a/include/uapi/rdma/ib_verbs.h
+++ b/include/uapi/rdma/ib_verbs.h
@@ -29,4 +29,20 @@ enum ib_uverbs_send_flags {
IB_UVERBS_SEND_END  = (1 << 5),
 };
 
+enum ib_uverbs_wc_opcode {
+   IB_UVERBS_WC_SEND   = 0,
+   IB_UVERBS_WC_RDMA_WRITE = 1,
+   IB_UVERBS_WC_RDMA_READ  = 2,
+   IB_UVERBS_WC_COMP_SWAP  = 3,
+   IB_UVERBS_WC_FETCH_ADD  = 4,
+   IB_UVERBS_WC_SEND_END   = 5,
+
+   /*
+* Set value of IB_WC_RECV so consumers can test if a completion is a
+* receive by testing (opcode & IB_WC_RECV).
+*/
+   IB_UVERBS_WC_RECV   = 1 << 7,
+   IB_UVERBS_WC_RECV_END   = (1 << 7) + 1,
+};
+
 #endif /* _UAPI_RDMA_IB_VERBS_H */
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/6] IB/uapi: expose uverbs WC flags

2015-12-24 Thread Christoph Hellwig
This exposes the WC flags supported by uverbs as part of the uapi
headers.  It follows the same scheme as the WR opcodes.

Signed-off-by: Christoph Hellwig <h...@lst.de>
---
 include/rdma/ib_verbs.h  | 14 +++---
 include/uapi/rdma/ib_verbs.h | 10 ++
 2 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 7dce204..337db70 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -835,13 +835,13 @@ enum ib_wc_opcode {
 };
 
 enum ib_wc_flags {
-   IB_WC_GRH   = 1,
-   IB_WC_WITH_IMM  = (1<<1),
-   IB_WC_WITH_INVALIDATE   = (1<<2),
-   IB_WC_IP_CSUM_OK= (1<<3),
-   IB_WC_WITH_SMAC = (1<<4),
-   IB_WC_WITH_VLAN = (1<<5),
-   IB_WC_WITH_NETWORK_HDR_TYPE = (1<<6),
+   IB_WC_GRH   = IB_UVERBS_WC_GRH,
+   IB_WC_WITH_IMM  = IB_UVERBS_WC_WITH_IMM,
+   IB_WC_WITH_INVALIDATE   = IB_UVERBS_WC_WITH_INVALIDATE,
+   IB_WC_IP_CSUM_OK= IB_UVERBS_WC_IP_CSUM_OK,
+   IB_WC_WITH_SMAC = IB_UVERBS_WC_WITH_SMAC,
+   IB_WC_WITH_VLAN = IB_UVERBS_WC_WITH_VLAN,
+   IB_WC_WITH_NETWORK_HDR_TYPE = IB_UVERBS_WC_WITH_NETWORK_HDR_TYPE,
 };
 
 struct ib_wc {
diff --git a/include/uapi/rdma/ib_verbs.h b/include/uapi/rdma/ib_verbs.h
index fd7a393..c40c00b 100644
--- a/include/uapi/rdma/ib_verbs.h
+++ b/include/uapi/rdma/ib_verbs.h
@@ -45,4 +45,14 @@ enum ib_uverbs_wc_opcode {
IB_UVERBS_WC_RECV_END   = (1 << 7) + 1,
 };
 
+enum ib_uverbs_wc_flags {
+   IB_UVERBS_WC_GRH= (1 << 0),
+   IB_UVERBS_WC_WITH_IMM   = (1 << 1),
+   IB_UVERBS_WC_WITH_INVALIDATE= (1 << 2),
+   IB_UVERBS_WC_IP_CSUM_OK = (1 << 3),
+   IB_UVERBS_WC_WITH_SMAC  = (1 << 4),
+   IB_UVERBS_WC_WITH_VLAN  = (1 << 5),
+   IB_UVERBS_WC_WITH_NETWORK_HDR_TYPE  = (1 << 6),
+};
+
 #endif /* _UAPI_RDMA_IB_VERBS_H */
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/6] IB/uapi: expose uverbs send WR flags

2015-12-24 Thread Christoph Hellwig
This exposes the send WR flags supported by uverbs as part of the uapi
headers.  It follows the same scheme as the WR opcodes.

Signed-off-by: Christoph Hellwig <h...@lst.de>
---
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  6 +++---
 include/rdma/ib_verbs.h  | 14 ++
 include/uapi/rdma/ib_verbs.h |  9 +
 3 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 2f82a08..6c264f0 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -118,9 +118,9 @@ struct mlx5_ib_pd {
  * enum ib_send_flags and enum ib_qp_type for low-level driver
  */
 
-#define MLX5_IB_SEND_UMR_UNREG IB_SEND_RESERVED_START
-#define MLX5_IB_SEND_UMR_FAIL_IF_FREE (IB_SEND_RESERVED_START << 1)
-#define MLX5_IB_SEND_UMR_UPDATE_MTT (IB_SEND_RESERVED_START << 2)
+#define MLX5_IB_SEND_UMR_UNREG IB_SEND_END
+#define MLX5_IB_SEND_UMR_FAIL_IF_FREE (IB_SEND_END << 1)
+#define MLX5_IB_SEND_UMR_UPDATE_MTT (IB_SEND_END << 2)
 #define MLX5_IB_QPT_REG_UMRIB_QPT_RESERVED1
 #define MLX5_IB_WR_UMR (IB_WR_END + 0)
 
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 94509e0..5dccc6a 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1134,15 +1134,13 @@ enum ib_wr_opcode {
 };
 
 enum ib_send_flags {
-   IB_SEND_FENCE   = 1,
-   IB_SEND_SIGNALED= (1<<1),
-   IB_SEND_SOLICITED   = (1<<2),
-   IB_SEND_INLINE  = (1<<3),
-   IB_SEND_IP_CSUM = (1<<4),
+   IB_SEND_FENCE   = IB_UVERBS_SEND_FENCE,
+   IB_SEND_SIGNALED= IB_UVERBS_SEND_SIGNALED,
+   IB_SEND_SOLICITED   = IB_UVERBS_SEND_SOLICITED,
+   IB_SEND_INLINE  = IB_UVERBS_SEND_INLINE,
+   IB_SEND_IP_CSUM = IB_UVERBS_SEND_IP_CSUM,
 
-   /* reserve bits 26-31 for low level drivers' internal use */
-   IB_SEND_RESERVED_START  = (1 << 26),
-   IB_SEND_RESERVED_END= (1 << 31),
+   IB_SEND_END = IB_UVERBS_SEND_END,
 };
 
 struct ib_sge {
diff --git a/include/uapi/rdma/ib_verbs.h b/include/uapi/rdma/ib_verbs.h
index 330175e..3be3152 100644
--- a/include/uapi/rdma/ib_verbs.h
+++ b/include/uapi/rdma/ib_verbs.h
@@ -20,4 +20,13 @@ enum ib_uverbs_wr_opcode {
IB_UVERBS_WR_END= 9,
 };
 
+enum ib_uverbs_send_flags {
+   IB_UVERBS_SEND_FENCE= (1 << 0),
+   IB_UVERBS_SEND_SIGNALED = (1 << 1),
+   IB_UVERBS_SEND_SOLICITED= (1 << 2),
+   IB_UVERBS_SEND_INLINE   = (1 << 3),
+   IB_UVERBS_SEND_IP_CSUM  = (1 << 4),
+   IB_UVERBS_SEND_END  = (1 << 5),
+};
+
 #endif /* _UAPI_RDMA_IB_VERBS_H */
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


start moving user space visible constants to uapi headers

2015-12-24 Thread Christoph Hellwig
Currently very little of the uverbs user interface is actually exposed in
uapi headers, and it's a constant struggle to figure out what's kernel
internal and what is actually exposed in public.  This series starts
sorting this out by creating the infrastructure for a uapi header shared
between uverbs and the core IB stack, and starts moving all WR and WC
constants as well as the device capabilitity flags there.

A lot more work will have to follow, and I hope others will help out as
well.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/10] IB: remove the write-only usecnt field from struct ib_mr

2015-12-23 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Bart Van Assche <bvanass...@sandisk.com>
Reviewed-by: Sagi Grimberg <sa...@mellanox.com>
---
 drivers/infiniband/core/uverbs_cmd.c| 6 --
 drivers/infiniband/core/verbs.c | 8 +---
 drivers/infiniband/hw/cxgb3/iwch_provider.c | 3 ---
 drivers/infiniband/hw/cxgb4/mem.c   | 3 ---
 drivers/staging/rdma/ehca/ehca_mrmw.c   | 1 -
 include/rdma/ib_verbs.h | 1 -
 6 files changed, 1 insertion(+), 21 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c 
b/drivers/infiniband/core/uverbs_cmd.c
index 5428ebe..0a84182d 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -993,7 +993,6 @@ ssize_t ib_uverbs_reg_mr(struct ib_uverbs_file *file,
mr->pd  = pd;
mr->uobject = uobj;
atomic_inc(>usecnt);
-   atomic_set(>usecnt, 0);
 
uobj->object = mr;
ret = idr_add_uobj(_uverbs_mr_idr, uobj);
@@ -1091,11 +1090,6 @@ ssize_t ib_uverbs_rereg_mr(struct ib_uverbs_file *file,
}
}
 
-   if (atomic_read(>usecnt)) {
-   ret = -EBUSY;
-   goto put_uobj_pd;
-   }
-
old_pd = mr->pd;
ret = mr->device->rereg_user_mr(mr, cmd.flags, cmd.start,
cmd.length, cmd.hca_va,
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index c5e0f07..072b94d 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1345,7 +1345,6 @@ struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int 
mr_access_flags)
mr->pd  = pd;
mr->uobject = NULL;
atomic_inc(>usecnt);
-   atomic_set(>usecnt, 0);
}
 
return mr;
@@ -1354,13 +1353,9 @@ EXPORT_SYMBOL(ib_get_dma_mr);
 
 int ib_dereg_mr(struct ib_mr *mr)
 {
-   struct ib_pd *pd;
+   struct ib_pd *pd = mr->pd;
int ret;
 
-   if (atomic_read(>usecnt))
-   return -EBUSY;
-
-   pd = mr->pd;
ret = mr->device->dereg_mr(mr);
if (!ret)
atomic_dec(>usecnt);
@@ -1396,7 +1391,6 @@ struct ib_mr *ib_alloc_mr(struct ib_pd *pd,
mr->pd  = pd;
mr->uobject = NULL;
atomic_inc(>usecnt);
-   atomic_set(>usecnt, 0);
}
 
return mr;
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c 
b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index 6743e9d..2734820 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -458,9 +458,6 @@ static int iwch_dereg_mr(struct ib_mr *ib_mr)
u32 mmid;
 
PDBG("%s ib_mr %p\n", __func__, ib_mr);
-   /* There can be no memory windows */
-   if (atomic_read(_mr->usecnt))
-   return -EINVAL;
 
mhp = to_iwch_mr(ib_mr);
kfree(mhp->pages);
diff --git a/drivers/infiniband/hw/cxgb4/mem.c 
b/drivers/infiniband/hw/cxgb4/mem.c
index 1eb833a..7849890 100644
--- a/drivers/infiniband/hw/cxgb4/mem.c
+++ b/drivers/infiniband/hw/cxgb4/mem.c
@@ -704,9 +704,6 @@ int c4iw_dereg_mr(struct ib_mr *ib_mr)
u32 mmid;
 
PDBG("%s ib_mr %p\n", __func__, ib_mr);
-   /* There can be no memory windows */
-   if (atomic_read(_mr->usecnt))
-   return -EINVAL;
 
mhp = to_c4iw_mr(ib_mr);
rhp = mhp->rhp;
diff --git a/drivers/staging/rdma/ehca/ehca_mrmw.c 
b/drivers/staging/rdma/ehca/ehca_mrmw.c
index 1814af7..06b832b 100644
--- a/drivers/staging/rdma/ehca/ehca_mrmw.c
+++ b/drivers/staging/rdma/ehca/ehca_mrmw.c
@@ -1339,7 +1339,6 @@ int ehca_reg_internal_maxmr(
e_mr->ib.ib_mr.pd = _pd->ib_pd;
e_mr->ib.ib_mr.uobject = NULL;
atomic_inc(&(e_pd->ib_pd.usecnt));
-   atomic_set(&(e_mr->ib.ib_mr.usecnt), 0);
*e_maxmr = e_mr;
return 0;
 
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 197b620..36acb30 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1414,7 +1414,6 @@ struct ib_mr {
u64iova;
u32length;
unsigned int   page_size;
-   atomic_t   usecnt; /* count number of MWs */
 };
 
 struct ib_mw {
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 05/10] cxgb3: simplify iwch_get_dma_wr

2015-12-23 Thread Christoph Hellwig
Fold simplified versions of build_phys_page_list and
iwch_register_phys_mem into iwch_get_dma_wr now that no other callers
are left.

Signed-off-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Sagi Grimberg <sa...@mellanox.com>
Reviewed-by: Jason Gunthorpe <jguntho...@obsidianresearch.com> [core]
Reviewed-by: Steve Wise <sw...@opengridcomputing.com>
---
 drivers/infiniband/hw/cxgb3/iwch_mem.c  | 71 
 drivers/infiniband/hw/cxgb3/iwch_provider.c | 73 ++---
 drivers/infiniband/hw/cxgb3/iwch_provider.h |  8 
 3 files changed, 26 insertions(+), 126 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch_mem.c 
b/drivers/infiniband/hw/cxgb3/iwch_mem.c
index 3a5e27d..1d04c87 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_mem.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_mem.c
@@ -99,74 +99,3 @@ int iwch_write_pbl(struct iwch_mr *mhp, __be64 *pages, int 
npages, int offset)
return cxio_write_pbl(>rhp->rdev, pages,
  mhp->attr.pbl_addr + (offset << 3), npages);
 }
-
-int build_phys_page_list(struct ib_phys_buf *buffer_list,
-   int num_phys_buf,
-   u64 *iova_start,
-   u64 *total_size,
-   int *npages,
-   int *shift,
-   __be64 **page_list)
-{
-   u64 mask;
-   int i, j, n;
-
-   mask = 0;
-   *total_size = 0;
-   for (i = 0; i < num_phys_buf; ++i) {
-   if (i != 0 && buffer_list[i].addr & ~PAGE_MASK)
-   return -EINVAL;
-   if (i != 0 && i != num_phys_buf - 1 &&
-   (buffer_list[i].size & ~PAGE_MASK))
-   return -EINVAL;
-   *total_size += buffer_list[i].size;
-   if (i > 0)
-   mask |= buffer_list[i].addr;
-   else
-   mask |= buffer_list[i].addr & PAGE_MASK;
-   if (i != num_phys_buf - 1)
-   mask |= buffer_list[i].addr + buffer_list[i].size;
-   else
-   mask |= (buffer_list[i].addr + buffer_list[i].size +
-   PAGE_SIZE - 1) & PAGE_MASK;
-   }
-
-   if (*total_size > 0xULL)
-   return -ENOMEM;
-
-   /* Find largest page shift we can use to cover buffers */
-   for (*shift = PAGE_SHIFT; *shift < 27; ++(*shift))
-   if ((1ULL << *shift) & mask)
-   break;
-
-   buffer_list[0].size += buffer_list[0].addr & ((1ULL << *shift) - 1);
-   buffer_list[0].addr &= ~0ull << *shift;
-
-   *npages = 0;
-   for (i = 0; i < num_phys_buf; ++i)
-   *npages += (buffer_list[i].size +
-   (1ULL << *shift) - 1) >> *shift;
-
-   if (!*npages)
-   return -EINVAL;
-
-   *page_list = kmalloc(sizeof(u64) * *npages, GFP_KERNEL);
-   if (!*page_list)
-   return -ENOMEM;
-
-   n = 0;
-   for (i = 0; i < num_phys_buf; ++i)
-   for (j = 0;
-j < (buffer_list[i].size + (1ULL << *shift) - 1) >> *shift;
-++j)
-   (*page_list)[n++] = cpu_to_be64(buffer_list[i].addr +
-   ((u64) j << *shift));
-
-   PDBG("%s va 0x%llx mask 0x%llx shift %d len %lld pbl_size %d\n",
-__func__, (unsigned long long) *iova_start,
-(unsigned long long) mask, *shift, (unsigned long long) 
*total_size,
-*npages);
-
-   return 0;
-
-}
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c 
b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index 384e1d7..6743e9d 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -479,24 +479,25 @@ static int iwch_dereg_mr(struct ib_mr *ib_mr)
return 0;
 }
 
-static struct ib_mr *iwch_register_phys_mem(struct ib_pd *pd,
-   struct ib_phys_buf *buffer_list,
-   int num_phys_buf,
-   int acc,
-   u64 *iova_start)
+static struct ib_mr *iwch_get_dma_mr(struct ib_pd *pd, int acc)
 {
-   __be64 *page_list;
-   int shift;
-   u64 total_size;
-   int npages;
-   struct iwch_dev *rhp;
-   struct iwch_pd *php;
+   const u64 total_size = 0x;
+   const u64 mask = (total_size + PAGE_SIZE - 1) & PAGE_MASK;
+   struct iwch_pd *php = to_iwch_pd(pd);
+   struct iwch_dev *rhp = php->rhp;
struct iwch_mr *mhp;
-   int ret;
+   __be64 *page_list;
+  

[PATCH 04/10] IB: remove in-kernel support for memory windows

2015-12-23 Thread Christoph Hellwig
Remove the unused ib_allow_mw and ib_bind_mw functions, remove the
unused IB_WR_BIND_MW and IB_WC_BIND_MW opcodes and move ib_dealloc_mw
into the uverbs module.

Signed-off-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Sagi Grimberg <sa...@mellanox.com>
Reviewed-by: Jason Gunthorpe <jguntho...@obsidianresearch.com> [core]
Reviewed-by: Steve Wise <sw...@opengridcomputing.com>
---
 Documentation/infiniband/core_locking.txt   |  2 -
 drivers/infiniband/core/uverbs.h|  2 +
 drivers/infiniband/core/uverbs_cmd.c|  4 +-
 drivers/infiniband/core/uverbs_main.c   | 13 -
 drivers/infiniband/core/verbs.c | 36 -
 drivers/infiniband/hw/cxgb3/iwch_cq.c   |  4 --
 drivers/infiniband/hw/cxgb3/iwch_provider.c |  1 -
 drivers/infiniband/hw/cxgb3/iwch_provider.h |  3 --
 drivers/infiniband/hw/cxgb3/iwch_qp.c   | 82 
 drivers/infiniband/hw/cxgb4/cq.c|  3 --
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h  |  2 -
 drivers/infiniband/hw/cxgb4/provider.c  |  1 -
 drivers/infiniband/hw/cxgb4/qp.c|  5 --
 drivers/infiniband/hw/mlx4/cq.c |  3 --
 drivers/infiniband/hw/mlx4/main.c   |  1 -
 drivers/infiniband/hw/mlx4/mlx4_ib.h|  2 -
 drivers/infiniband/hw/mlx4/mr.c | 22 
 drivers/infiniband/hw/mlx4/qp.c | 27 --
 drivers/infiniband/hw/mlx5/cq.c |  3 --
 drivers/infiniband/hw/mthca/mthca_cq.c  |  3 --
 drivers/infiniband/hw/nes/nes_verbs.c   | 75 --
 drivers/staging/rdma/amso1100/c2_cq.c   |  3 --
 drivers/staging/rdma/ehca/ehca_iverbs.h |  3 --
 drivers/staging/rdma/ehca/ehca_main.c   |  1 -
 drivers/staging/rdma/ehca/ehca_mrmw.c   | 12 -
 drivers/staging/rdma/ehca/ehca_reqs.c   |  1 -
 include/rdma/ib_verbs.h | 83 -
 27 files changed, 16 insertions(+), 381 deletions(-)

diff --git a/Documentation/infiniband/core_locking.txt 
b/Documentation/infiniband/core_locking.txt
index e167854..4b1f36b 100644
--- a/Documentation/infiniband/core_locking.txt
+++ b/Documentation/infiniband/core_locking.txt
@@ -15,7 +15,6 @@ Sleeping and interrupt context
 modify_ah
 query_ah
 destroy_ah
-bind_mw
 post_send
 post_recv
 poll_cq
@@ -31,7 +30,6 @@ Sleeping and interrupt context
 ib_modify_ah
 ib_query_ah
 ib_destroy_ah
-ib_bind_mw
 ib_post_send
 ib_post_recv
 ib_req_notify_cq
diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index 94bbd8c..612ccfd 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -204,6 +204,8 @@ void ib_uverbs_event_handler(struct ib_event_handler 
*handler,
 struct ib_event *event);
 void ib_uverbs_dealloc_xrcd(struct ib_uverbs_device *dev, struct ib_xrcd 
*xrcd);
 
+int uverbs_dealloc_mw(struct ib_mw *mw);
+
 struct ib_uverbs_flow_spec {
union {
union {
diff --git a/drivers/infiniband/core/uverbs_cmd.c 
b/drivers/infiniband/core/uverbs_cmd.c
index 9561056..5428ebe 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -1243,7 +1243,7 @@ err_copy:
idr_remove_uobj(_uverbs_mw_idr, uobj);
 
 err_unalloc:
-   ib_dealloc_mw(mw);
+   uverbs_dealloc_mw(mw);
 
 err_put:
put_pd_read(pd);
@@ -1272,7 +1272,7 @@ ssize_t ib_uverbs_dealloc_mw(struct ib_uverbs_file *file,
 
mw = uobj->object;
 
-   ret = ib_dealloc_mw(mw);
+   ret = uverbs_dealloc_mw(mw);
if (!ret)
uobj->live = 0;
 
diff --git a/drivers/infiniband/core/uverbs_main.c 
b/drivers/infiniband/core/uverbs_main.c
index e3ef288..39680ae 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -133,6 +133,17 @@ static int (*uverbs_ex_cmd_table[])(struct ib_uverbs_file 
*file,
 static void ib_uverbs_add_one(struct ib_device *device);
 static void ib_uverbs_remove_one(struct ib_device *device, void *client_data);
 
+int uverbs_dealloc_mw(struct ib_mw *mw)
+{
+   struct ib_pd *pd = mw->pd;
+   int ret;
+
+   ret = mw->device->dealloc_mw(mw);
+   if (!ret)
+   atomic_dec(>usecnt);
+   return ret;
+}
+
 static void ib_uverbs_release_dev(struct kobject *kobj)
 {
struct ib_uverbs_device *dev =
@@ -224,7 +235,7 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file 
*file,
struct ib_mw *mw = uobj->object;
 
idr_remove_uobj(_uverbs_mw_idr, uobj);
-   ib_dealloc_mw(mw);
+   uverbs_dealloc_mw(mw);
kfree(uobj);
}
 
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 70b1016..c5e0f07 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1403,42 +1403,6 

MR cleanups / dead code removal V2

2015-12-23 Thread Christoph Hellwig
Hi Doug,

this series (on top of your k.o/for-4.5 branch) has various MR-related
cleanups:  starting to document the device capabilities, removing lots
of dead MR/MW code and removing a useless field in struct ib_mr.  This
should be fairly uncontroversial I hope, so I'd like to get it in before
any real MR changes.

Changes since V1:
 - rebase
 - updated description for the last patch
 - additional ACKs

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/10] IB: remove support for phys MRs

2015-12-23 Thread Christoph Hellwig
We have stopped using phys MRs in the kernel a while ago, so let's
remove all the cruft used to implement them.

Signed-off-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Sagi Grimberg <sa...@mellanox.com>
Reviewed-by: Jason Gunthorpe <jguntho...@obsidianresearch.com> [core]
Reviewed-By: Devesh Sharma<devesh.sha...@avagotech.com> [ocrdma]
Reviewed-by: Steve Wise <sw...@opengridcomputing.com>
---
 drivers/infiniband/hw/cxgb3/iwch_mem.c   |  31 ---
 drivers/infiniband/hw/cxgb3/iwch_provider.c  |  69 --
 drivers/infiniband/hw/cxgb3/iwch_provider.h  |   4 -
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h   |  11 -
 drivers/infiniband/hw/cxgb4/mem.c| 248 -
 drivers/infiniband/hw/cxgb4/provider.c   |   2 -
 drivers/infiniband/hw/mthca/mthca_provider.c |  84 ---
 drivers/infiniband/hw/nes/nes_cm.c   |   7 +-
 drivers/infiniband/hw/nes/nes_verbs.c|   3 +-
 drivers/infiniband/hw/nes/nes_verbs.h|   5 +
 drivers/infiniband/hw/ocrdma/ocrdma_main.c   |   1 -
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c  | 163 --
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.h  |   3 -
 drivers/infiniband/hw/qib/qib_mr.c   |  51 +
 drivers/infiniband/hw/qib/qib_verbs.c|   1 -
 drivers/infiniband/hw/qib/qib_verbs.h|   4 -
 drivers/staging/rdma/amso1100/c2_provider.c  |   1 -
 drivers/staging/rdma/ehca/ehca_iverbs.h  |  11 -
 drivers/staging/rdma/ehca/ehca_main.c|   2 -
 drivers/staging/rdma/ehca/ehca_mrmw.c| 321 ---
 drivers/staging/rdma/ehca/ehca_mrmw.h|   5 -
 drivers/staging/rdma/hfi1/mr.c   |  51 +
 drivers/staging/rdma/hfi1/verbs.c|   1 -
 drivers/staging/rdma/hfi1/verbs.h|   4 -
 drivers/staging/rdma/ipath/ipath_mr.c|  55 -
 drivers/staging/rdma/ipath/ipath_verbs.c |   1 -
 drivers/staging/rdma/ipath/ipath_verbs.h |   4 -
 include/rdma/ib_verbs.h  |  16 +-
 28 files changed, 15 insertions(+), 1144 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch_mem.c 
b/drivers/infiniband/hw/cxgb3/iwch_mem.c
index 5c36ee2..3a5e27d 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_mem.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_mem.c
@@ -75,37 +75,6 @@ int iwch_register_mem(struct iwch_dev *rhp, struct iwch_pd 
*php,
return ret;
 }
 
-int iwch_reregister_mem(struct iwch_dev *rhp, struct iwch_pd *php,
-   struct iwch_mr *mhp,
-   int shift,
-   int npages)
-{
-   u32 stag;
-   int ret;
-
-   /* We could support this... */
-   if (npages > mhp->attr.pbl_size)
-   return -ENOMEM;
-
-   stag = mhp->attr.stag;
-   if (cxio_reregister_phys_mem(>rdev,
-  , mhp->attr.pdid,
-  mhp->attr.perms,
-  mhp->attr.zbva,
-  mhp->attr.va_fbo,
-  mhp->attr.len,
-  shift - 12,
-  mhp->attr.pbl_size, mhp->attr.pbl_addr))
-   return -ENOMEM;
-
-   ret = iwch_finish_mem_reg(mhp, stag);
-   if (ret)
-   cxio_dereg_mem(>rdev, mhp->attr.stag, mhp->attr.pbl_size,
-  mhp->attr.pbl_addr);
-
-   return ret;
-}
-
 int iwch_alloc_pbl(struct iwch_mr *mhp, int npages)
 {
mhp->attr.pbl_addr = cxio_hal_pblpool_alloc(>rhp->rdev,
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c 
b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index c34725c..f7aa019 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -556,73 +556,6 @@ err:
 
 }
 
-static int iwch_reregister_phys_mem(struct ib_mr *mr,
-int mr_rereg_mask,
-struct ib_pd *pd,
-struct ib_phys_buf *buffer_list,
-int num_phys_buf,
-int acc, u64 * iova_start)
-{
-
-   struct iwch_mr mh, *mhp;
-   struct iwch_pd *php;
-   struct iwch_dev *rhp;
-   __be64 *page_list = NULL;
-   int shift = 0;
-   u64 total_size;
-   int npages = 0;
-   int ret;
-
-   PDBG("%s ib_mr %p ib_pd %p\n", __func__, mr, pd);
-
-   /* There can be no memory windows */
-   if (atomic_read(>usecnt))
-   return -EINVAL;
-
-   mhp = to_iwch_mr(mr);
-   rhp = mhp->rhp;
-   php = to_iwch_pd(mr->pd);
-
-   /* make sure we are on the same adapter */
-   if (rhp != php->rhp)
-   return -EINVAL;
-
-   memcpy(, mhp, sizeof *mhp);
-
-   if (mr_rereg_m

[PATCH 02/10] IB: remove ib_query_mr

2015-12-23 Thread Christoph Hellwig
This functionality has no users and was only supported by the staged out
EHCA driver.

Signed-off-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Sagi Grimberg <sa...@mellanox.com>
Reviewed-by: Jason Gunthorpe <jguntho...@obsidianresearch.com> [core]
Reviewed-by: Steve Wise <sw...@opengridcomputing.com>
---
 drivers/infiniband/core/verbs.c |  7 -
 drivers/staging/rdma/ehca/ehca_iverbs.h |  2 --
 drivers/staging/rdma/ehca/ehca_main.c   |  1 -
 drivers/staging/rdma/ehca/ehca_mrmw.c   | 49 -
 include/rdma/ib_verbs.h | 18 
 5 files changed, 77 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 063210b..70b1016 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1352,13 +1352,6 @@ struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int 
mr_access_flags)
 }
 EXPORT_SYMBOL(ib_get_dma_mr);
 
-int ib_query_mr(struct ib_mr *mr, struct ib_mr_attr *mr_attr)
-{
-   return mr->device->query_mr ?
-   mr->device->query_mr(mr, mr_attr) : -ENOSYS;
-}
-EXPORT_SYMBOL(ib_query_mr);
-
 int ib_dereg_mr(struct ib_mr *mr)
 {
struct ib_pd *pd;
diff --git a/drivers/staging/rdma/ehca/ehca_iverbs.h 
b/drivers/staging/rdma/ehca/ehca_iverbs.h
index 80e6a3d..30b1316 100644
--- a/drivers/staging/rdma/ehca/ehca_iverbs.h
+++ b/drivers/staging/rdma/ehca/ehca_iverbs.h
@@ -95,8 +95,6 @@ int ehca_rereg_phys_mr(struct ib_mr *mr,
   struct ib_phys_buf *phys_buf_array,
   int num_phys_buf, int mr_access_flags, u64 *iova_start);
 
-int ehca_query_mr(struct ib_mr *mr, struct ib_mr_attr *mr_attr);
-
 int ehca_dereg_mr(struct ib_mr *mr);
 
 struct ib_mw *ehca_alloc_mw(struct ib_pd *pd, enum ib_mw_type type);
diff --git a/drivers/staging/rdma/ehca/ehca_main.c 
b/drivers/staging/rdma/ehca/ehca_main.c
index 8246418..ab0a64a 100644
--- a/drivers/staging/rdma/ehca/ehca_main.c
+++ b/drivers/staging/rdma/ehca/ehca_main.c
@@ -514,7 +514,6 @@ static int ehca_init_device(struct ehca_shca *shca)
shca->ib_device.get_dma_mr  = ehca_get_dma_mr;
shca->ib_device.reg_phys_mr = ehca_reg_phys_mr;
shca->ib_device.reg_user_mr = ehca_reg_user_mr;
-   shca->ib_device.query_mr= ehca_query_mr;
shca->ib_device.dereg_mr= ehca_dereg_mr;
shca->ib_device.rereg_phys_mr   = ehca_rereg_phys_mr;
shca->ib_device.alloc_mw= ehca_alloc_mw;
diff --git a/drivers/staging/rdma/ehca/ehca_mrmw.c 
b/drivers/staging/rdma/ehca/ehca_mrmw.c
index f914b30..eb274c1 100644
--- a/drivers/staging/rdma/ehca/ehca_mrmw.c
+++ b/drivers/staging/rdma/ehca/ehca_mrmw.c
@@ -589,55 +589,6 @@ rereg_phys_mr_exit0:
return ret;
 } /* end ehca_rereg_phys_mr() */
 
-/*--*/
-
-int ehca_query_mr(struct ib_mr *mr, struct ib_mr_attr *mr_attr)
-{
-   int ret = 0;
-   u64 h_ret;
-   struct ehca_shca *shca =
-   container_of(mr->device, struct ehca_shca, ib_device);
-   struct ehca_mr *e_mr = container_of(mr, struct ehca_mr, ib.ib_mr);
-   unsigned long sl_flags;
-   struct ehca_mr_hipzout_parms hipzout;
-
-   if ((e_mr->flags & EHCA_MR_FLAG_FMR)) {
-   ehca_err(mr->device, "not supported for FMR, mr=%p e_mr=%p "
-"e_mr->flags=%x", mr, e_mr, e_mr->flags);
-   ret = -EINVAL;
-   goto query_mr_exit0;
-   }
-
-   memset(mr_attr, 0, sizeof(struct ib_mr_attr));
-   spin_lock_irqsave(_mr->mrlock, sl_flags);
-
-   h_ret = hipz_h_query_mr(shca->ipz_hca_handle, e_mr, );
-   if (h_ret != H_SUCCESS) {
-   ehca_err(mr->device, "hipz_mr_query failed, h_ret=%lli mr=%p "
-"hca_hndl=%llx mr_hndl=%llx lkey=%x",
-h_ret, mr, shca->ipz_hca_handle.handle,
-e_mr->ipz_mr_handle.handle, mr->lkey);
-   ret = ehca2ib_return_code(h_ret);
-   goto query_mr_exit1;
-   }
-   mr_attr->pd = mr->pd;
-   mr_attr->device_virt_addr = hipzout.vaddr;
-   mr_attr->size = hipzout.len;
-   mr_attr->lkey = hipzout.lkey;
-   mr_attr->rkey = hipzout.rkey;
-   ehca_mrmw_reverse_map_acl(, _attr->mr_access_flags);
-
-query_mr_exit1:
-   spin_unlock_irqrestore(_mr->mrlock, sl_flags);
-query_mr_exit0:
-   if (ret)
-   ehca_err(mr->device, "ret=%i mr=%p mr_attr=%p",
-ret, mr, mr_attr);
-   return ret;
-} /* end ehca_query_mr() */
-
-/*--*/
-
 int ehca_dereg_mr(struct ib_mr *mr)
 {
int ret = 0;
diff --git a/include/rdma/ib_verbs.h 

[PATCH 06/10] nes: simplify nes_reg_phys_mr calling conventions

2015-12-23 Thread Christoph Hellwig
Just pass and address/size pair instead of an ib_phys_buf array.

Signed-off-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Sagi Grimberg <sa...@mellanox.com>
Reviewed-by: Jason Gunthorpe <jguntho...@obsidianresearch.com> [core]
Reviewed-by: Steve Wise <sw...@opengridcomputing.com>
---
 drivers/infiniband/hw/nes/nes_cm.c|  10 +--
 drivers/infiniband/hw/nes/nes_verbs.c | 140 --
 drivers/infiniband/hw/nes/nes_verbs.h |   3 +-
 3 files changed, 37 insertions(+), 116 deletions(-)

diff --git a/drivers/infiniband/hw/nes/nes_cm.c 
b/drivers/infiniband/hw/nes/nes_cm.c
index 242c87d..bc37adb 100644
--- a/drivers/infiniband/hw/nes/nes_cm.c
+++ b/drivers/infiniband/hw/nes/nes_cm.c
@@ -3232,7 +3232,6 @@ int nes_accept(struct iw_cm_id *cm_id, struct 
iw_cm_conn_param *conn_param)
int passive_state;
struct nes_ib_device *nesibdev;
struct ib_mr *ibmr = NULL;
-   struct ib_phys_buf ibphysbuf;
struct nes_pd *nespd;
u64 tagged_offset;
u8 mpa_frame_offset = 0;
@@ -3316,12 +3315,11 @@ int nes_accept(struct iw_cm_id *cm_id, struct 
iw_cm_conn_param *conn_param)
u64temp = (unsigned long)nesqp;
nesibdev = nesvnic->nesibdev;
nespd = nesqp->nespd;
-   ibphysbuf.addr = nesqp->ietf_frame_pbase + mpa_frame_offset;
-   ibphysbuf.size = buff_len;
tagged_offset = (u64)(unsigned long)*start_buff;
-   ibmr = nes_reg_phys_mr(>ibpd, , 1,
-   IB_ACCESS_LOCAL_WRITE,
-   _offset);
+   ibmr = nes_reg_phys_mr(>ibpd,
+   nesqp->ietf_frame_pbase + mpa_frame_offset,
+   buff_len, IB_ACCESS_LOCAL_WRITE,
+   _offset);
if (!ibmr) {
nes_debug(NES_DBG_CM, "Unable to register memory region"
  "for lSMM for cm_node = %p \n",
diff --git a/drivers/infiniband/hw/nes/nes_verbs.c 
b/drivers/infiniband/hw/nes/nes_verbs.c
index c8c661e..8c4daf7 100644
--- a/drivers/infiniband/hw/nes/nes_verbs.c
+++ b/drivers/infiniband/hw/nes/nes_verbs.c
@@ -2000,9 +2000,8 @@ static int nes_reg_mr(struct nes_device *nesdev, struct 
nes_pd *nespd,
 /**
  * nes_reg_phys_mr
  */
-struct ib_mr *nes_reg_phys_mr(struct ib_pd *ib_pd,
-   struct ib_phys_buf *buffer_list, int num_phys_buf, int acc,
-   u64 * iova_start)
+struct ib_mr *nes_reg_phys_mr(struct ib_pd *ib_pd, u64 addr, u64 size,
+   int acc, u64 *iova_start)
 {
u64 region_length;
struct nes_pd *nespd = to_nespd(ib_pd);
@@ -2014,13 +2013,10 @@ struct ib_mr *nes_reg_phys_mr(struct ib_pd *ib_pd,
struct nes_vpbl vpbl;
struct nes_root_vpbl root_vpbl;
u32 stag;
-   u32 i;
unsigned long mask;
u32 stag_index = 0;
u32 next_stag_index = 0;
u32 driver_key = 0;
-   u32 root_pbl_index = 0;
-   u32 cur_pbl_index = 0;
int err = 0;
int ret = 0;
u16 pbl_count = 0;
@@ -2039,11 +2035,8 @@ struct ib_mr *nes_reg_phys_mr(struct ib_pd *ib_pd,
 
next_stag_index >>= 8;
next_stag_index %= nesadapter->max_mr;
-   if (num_phys_buf > (1024*512)) {
-   return ERR_PTR(-E2BIG);
-   }
 
-   if ((buffer_list[0].addr ^ *iova_start) & ~PAGE_MASK)
+   if ((addr ^ *iova_start) & ~PAGE_MASK)
return ERR_PTR(-EINVAL);
 
err = nes_alloc_resource(nesadapter, nesadapter->allocated_mrs, 
nesadapter->max_mr,
@@ -2058,84 +2051,33 @@ struct ib_mr *nes_reg_phys_mr(struct ib_pd *ib_pd,
return ERR_PTR(-ENOMEM);
}
 
-   for (i = 0; i < num_phys_buf; i++) {
+   /* Allocate a 4K buffer for the PBL */
+   vpbl.pbl_vbase = pci_alloc_consistent(nesdev->pcidev, 4096,
+   _pbase);
+   nes_debug(NES_DBG_MR, "Allocating leaf PBL, va = %p, pa = 0x%016lX\n",
+   vpbl.pbl_vbase, (unsigned long)vpbl.pbl_pbase);
+   if (!vpbl.pbl_vbase) {
+   nes_free_resource(nesadapter, nesadapter->allocated_mrs, 
stag_index);
+   ibmr = ERR_PTR(-ENOMEM);
+   kfree(nesmr);
+   goto reg_phys_err;
+   }
 
-   if ((i & 0x01FF) == 0) {
-   if (root_pbl_index == 1) {
-   /* Allocate the root PBL */
-   root_vpbl.pbl_vbase = 
pci_alloc_consistent(nesdev->pcidev, 8192,
-   _vpbl.pbl_pbase);
-   nes_debug(NES_DBG_MR, "Allocating root PBL, va 
= %p, pa = 0x%08X\n",
-   root_vpbl.pbl_vbase, (unsigned 
int)root_vpbl.pbl_pba

[PATCH 09/10] IB: remove the struct ib_phys_buf definition

2015-12-23 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Sagi Grimberg <sa...@mellanox.com>
Reviewed-by: Jason Gunthorpe <jguntho...@obsidianresearch.com> [core]
Reviewed-by: Steve Wise <sw...@opengridcomputing.com>
---
 include/rdma/ib_verbs.h | 5 -
 1 file changed, 5 deletions(-)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 1778442..197b620 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1251,11 +1251,6 @@ enum ib_access_flags {
IB_ACCESS_ON_DEMAND = (1<<6),
 };
 
-struct ib_phys_buf {
-   u64  addr;
-   u64  size;
-};
-
 /*
  * XXX: these are apparently used for ->rereg_user_mr, no idea why they
  * are hidden here instead of a uapi header!
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 08/10] ehca: stop using struct ib_phys_buf

2015-12-23 Thread Christoph Hellwig
And simplify the calling convention for full-memory registrations.

Signed-off-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Sagi Grimberg <sa...@mellanox.com>
Reviewed-by: Jason Gunthorpe <jguntho...@obsidianresearch.com> [core]
Reviewed-by: Steve Wise <sw...@opengridcomputing.com>
---
 drivers/staging/rdma/ehca/ehca_classes.h |  5 +-
 drivers/staging/rdma/ehca/ehca_mrmw.c| 94 +++-
 2 files changed, 46 insertions(+), 53 deletions(-)

diff --git a/drivers/staging/rdma/ehca/ehca_classes.h 
b/drivers/staging/rdma/ehca/ehca_classes.h
index bd45e0f..e8c3387 100644
--- a/drivers/staging/rdma/ehca/ehca_classes.h
+++ b/drivers/staging/rdma/ehca/ehca_classes.h
@@ -316,9 +316,8 @@ struct ehca_mr_pginfo {
 
union {
struct { /* type EHCA_MR_PGI_PHYS section */
-   int num_phys_buf;
-   struct ib_phys_buf *phys_buf_array;
-   u64 next_buf;
+   u64 addr;
+   u16 size;
} phy;
struct { /* type EHCA_MR_PGI_USER section */
struct ib_umem *region;
diff --git a/drivers/staging/rdma/ehca/ehca_mrmw.c 
b/drivers/staging/rdma/ehca/ehca_mrmw.c
index c6e3245..1814af7 100644
--- a/drivers/staging/rdma/ehca/ehca_mrmw.c
+++ b/drivers/staging/rdma/ehca/ehca_mrmw.c
@@ -1289,7 +1289,6 @@ int ehca_reg_internal_maxmr(
u64 *iova_start;
u64 size_maxmr;
struct ehca_mr_pginfo pginfo;
-   struct ib_phys_buf ib_pbuf;
u32 num_kpages;
u32 num_hwpages;
u64 hw_pgsize;
@@ -1310,8 +1309,6 @@ int ehca_reg_internal_maxmr(
/* register internal max-MR on HCA */
size_maxmr = ehca_mr_len;
iova_start = (u64 *)ehca_map_vaddr((void *)(KERNELBASE + 
PHYSICAL_START));
-   ib_pbuf.addr = 0;
-   ib_pbuf.size = size_maxmr;
num_kpages = NUM_CHUNKS(((u64)iova_start % PAGE_SIZE) + size_maxmr,
PAGE_SIZE);
hw_pgsize = ehca_get_max_hwpage_size(shca);
@@ -1323,8 +1320,8 @@ int ehca_reg_internal_maxmr(
pginfo.num_kpages = num_kpages;
pginfo.num_hwpages = num_hwpages;
pginfo.hwpage_size = hw_pgsize;
-   pginfo.u.phy.num_phys_buf = 1;
-   pginfo.u.phy.phys_buf_array = _pbuf;
+   pginfo.u.phy.addr = 0;
+   pginfo.u.phy.size = size_maxmr;
 
ret = ehca_reg_mr(shca, e_mr, iova_start, size_maxmr, 0, e_pd,
  , _mr->ib.ib_mr.lkey,
@@ -1620,57 +1617,54 @@ static int ehca_set_pagebuf_phys(struct ehca_mr_pginfo 
*pginfo,
 u32 number, u64 *kpage)
 {
int ret = 0;
-   struct ib_phys_buf *pbuf;
+   u64 addr = pginfo->u.phy.addr;
+   u64 size = pginfo->u.phy.size;
u64 num_hw, offs_hw;
u32 i = 0;
 
-   /* loop over desired phys_buf_array entries */
-   while (i < number) {
-   pbuf   = pginfo->u.phy.phys_buf_array + pginfo->u.phy.next_buf;
-   num_hw  = NUM_CHUNKS((pbuf->addr % pginfo->hwpage_size) +
-pbuf->size, pginfo->hwpage_size);
-   offs_hw = (pbuf->addr & ~(pginfo->hwpage_size - 1)) /
-   pginfo->hwpage_size;
-   while (pginfo->next_hwpage < offs_hw + num_hw) {
-   /* sanity check */
-   if ((pginfo->kpage_cnt >= pginfo->num_kpages) ||
-   (pginfo->hwpage_cnt >= pginfo->num_hwpages)) {
-   ehca_gen_err("kpage_cnt >= num_kpages, "
-"kpage_cnt=%llx num_kpages=%llx "
-"hwpage_cnt=%llx "
-"num_hwpages=%llx i=%x",
-pginfo->kpage_cnt,
-pginfo->num_kpages,
-pginfo->hwpage_cnt,
-pginfo->num_hwpages, i);
-   return -EFAULT;
-   }
-   *kpage = (pbuf->addr & ~(pginfo->hwpage_size - 1)) +
-(pginfo->next_hwpage * pginfo->hwpage_size);
-   if ( !(*kpage) && pbuf->addr ) {
-   ehca_gen_err("pbuf->addr=%llx pbuf->size=%llx "
-"next_hwpage=%llx", pbuf->addr,
-pbuf->size, pginfo->next_hwpage);
-   return -EFAULT;
-   }
-   (pginfo->hwpage_cnt)++;
-   (pginfo-&g

[PATCH 07/10] amso1100: fold c2_reg_phys_mr into c2_get_dma_mr

2015-12-23 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Sagi Grimberg <sa...@mellanox.com>
Reviewed-by: Jason Gunthorpe <jguntho...@obsidianresearch.com> [core]
Reviewed-by: Steve Wise <sw...@opengridcomputing.com>
---
 drivers/staging/rdma/amso1100/c2_provider.c | 71 ++---
 1 file changed, 14 insertions(+), 57 deletions(-)

diff --git a/drivers/staging/rdma/amso1100/c2_provider.c 
b/drivers/staging/rdma/amso1100/c2_provider.c
index 15665c0..4e4f554 100644
--- a/drivers/staging/rdma/amso1100/c2_provider.c
+++ b/drivers/staging/rdma/amso1100/c2_provider.c
@@ -337,43 +337,21 @@ static inline u32 c2_convert_access(int acc)
C2_ACF_LOCAL_READ | C2_ACF_WINDOW_BIND;
 }
 
-static struct ib_mr *c2_reg_phys_mr(struct ib_pd *ib_pd,
-   struct ib_phys_buf *buffer_list,
-   int num_phys_buf, int acc, u64 * iova_start)
+static struct ib_mr *c2_get_dma_mr(struct ib_pd *pd, int acc)
 {
struct c2_mr *mr;
u64 *page_list;
-   u32 total_len;
-   int err, i, j, k, page_shift, pbl_depth;
+   const u32 total_len = 0x;   /* AMSO1100 limit */
+   int err, page_shift, pbl_depth, i;
+   u64 kva = 0;
 
-   pbl_depth = 0;
-   total_len = 0;
+   pr_debug("%s:%u\n", __func__, __LINE__);
 
-   page_shift = PAGE_SHIFT;
/*
-* If there is only 1 buffer we assume this could
-* be a map of all phy mem...use a 32k page_shift.
+* This is a map of all phy mem...use a 32k page_shift.
 */
-   if (num_phys_buf == 1)
-   page_shift += 3;
-
-   for (i = 0; i < num_phys_buf; i++) {
-
-   if (offset_in_page(buffer_list[i].addr)) {
-   pr_debug("Unaligned Memory Buffer: 0x%x\n",
-   (unsigned int) buffer_list[i].addr);
-   return ERR_PTR(-EINVAL);
-   }
-
-   if (!buffer_list[i].size) {
-   pr_debug("Invalid Buffer Size\n");
-   return ERR_PTR(-EINVAL);
-   }
-
-   total_len += buffer_list[i].size;
-   pbl_depth += ALIGN(buffer_list[i].size,
-  BIT(page_shift)) >> page_shift;
-   }
+   page_shift = PAGE_SHIFT + 3;
+   pbl_depth = ALIGN(total_len, BIT(page_shift)) >> page_shift;
 
page_list = vmalloc(sizeof(u64) * pbl_depth);
if (!page_list) {
@@ -382,16 +360,8 @@ static struct ib_mr *c2_reg_phys_mr(struct ib_pd *ib_pd,
return ERR_PTR(-ENOMEM);
}
 
-   for (i = 0, j = 0; i < num_phys_buf; i++) {
-
-   int naddrs;
-
-   naddrs = ALIGN(buffer_list[i].size,
-  BIT(page_shift)) >> page_shift;
-   for (k = 0; k < naddrs; k++)
-   page_list[j++] = (buffer_list[i].addr +
-(k << page_shift));
-   }
+   for (i = 0; i < pbl_depth; i++)
+   page_list[i] = (i << page_shift);
 
mr = kmalloc(sizeof(*mr), GFP_KERNEL);
if (!mr) {
@@ -399,17 +369,17 @@ static struct ib_mr *c2_reg_phys_mr(struct ib_pd *ib_pd,
return ERR_PTR(-ENOMEM);
}
 
-   mr->pd = to_c2pd(ib_pd);
+   mr->pd = to_c2pd(pd);
mr->umem = NULL;
pr_debug("%s - page shift %d, pbl_depth %d, total_len %u, "
"*iova_start %llx, first pa %llx, last pa %llx\n",
__func__, page_shift, pbl_depth, total_len,
-   (unsigned long long) *iova_start,
+   (unsigned long long) kva,
(unsigned long long) page_list[0],
(unsigned long long) page_list[pbl_depth-1]);
-   err = c2_nsmr_register_phys_kern(to_c2dev(ib_pd->device), page_list,
+   err = c2_nsmr_register_phys_kern(to_c2dev(pd->device), page_list,
 BIT(page_shift), pbl_depth,
-total_len, 0, iova_start,
+total_len, 0, ,
 c2_convert_access(acc), mr);
vfree(page_list);
if (err) {
@@ -420,19 +390,6 @@ static struct ib_mr *c2_reg_phys_mr(struct ib_pd *ib_pd,
return >ibmr;
 }
 
-static struct ib_mr *c2_get_dma_mr(struct ib_pd *pd, int acc)
-{
-   struct ib_phys_buf bl;
-   u64 kva = 0;
-
-   pr_debug("%s:%u\n", __func__, __LINE__);
-
-   /* AMSO1100 limit */
-   bl.size = 0x;
-   bl.addr = 0;
-   return c2_reg_phys_mr(pd, , 1, acc, );
-}
-
 static struct ib_mr *c2_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
u64 virt, int acc, struct ib_udata *udata)
 {
-- 
1.9.1

--
To unsub

[PATCH] svc_rdma: use local_dma_lkey

2015-12-22 Thread Christoph Hellwig
We now alwasy have a per-PD local_dma_lkey available.  Make use of that
fact in svc_rdma and stop registering our own MR.

Signed-off-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Sagi Grimberg <sa...@mellanox.com>
Reviewed-by: Jason Gunthorpe <jguntho...@obsidianresearch.com>
Reviewed-by: Chuck Lever <chuck.le...@oracle.com>
---
 include/linux/sunrpc/svc_rdma.h|  2 --
 net/sunrpc/xprtrdma/svc_rdma_backchannel.c |  2 +-
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c|  4 ++--
 net/sunrpc/xprtrdma/svc_rdma_sendto.c  |  6 ++---
 net/sunrpc/xprtrdma/svc_rdma_transport.c   | 36 --
 5 files changed, 10 insertions(+), 40 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index b13513a..5322fea 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -156,13 +156,11 @@ struct svcxprt_rdma {
struct ib_qp *sc_qp;
struct ib_cq *sc_rq_cq;
struct ib_cq *sc_sq_cq;
-   struct ib_mr *sc_phys_mr;   /* MR for server memory */
int  (*sc_reader)(struct svcxprt_rdma *,
  struct svc_rqst *,
  struct svc_rdma_op_ctxt *,
  int *, u32 *, u32, u32, u64, bool);
u32  sc_dev_caps;   /* distilled device caps */
-   u32  sc_dma_lkey;   /* local dma key */
unsigned int sc_frmr_pg_list_len;
struct list_head sc_frmr_q;
spinlock_t   sc_frmr_q_lock;
diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c 
b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
index 417cec1..c428734 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
@@ -128,7 +128,7 @@ static int svc_rdma_bc_sendto(struct svcxprt_rdma *rdma,
 
ctxt->wr_op = IB_WR_SEND;
ctxt->direction = DMA_TO_DEVICE;
-   ctxt->sge[0].lkey = rdma->sc_dma_lkey;
+   ctxt->sge[0].lkey = rdma->sc_pd->local_dma_lkey;
ctxt->sge[0].length = sndbuf->len;
ctxt->sge[0].addr =
ib_dma_map_page(rdma->sc_cm_id->device, ctxt->pages[0], 0,
diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c 
b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index 3dfe464..c8b8a8b 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -144,6 +144,7 @@ int rdma_read_chunk_lcl(struct svcxprt_rdma *xprt,
 
head->arg.pages[pg_no] = rqstp->rq_arg.pages[pg_no];
head->arg.page_len += len;
+
head->arg.len += len;
if (!pg_off)
head->count++;
@@ -160,8 +161,7 @@ int rdma_read_chunk_lcl(struct svcxprt_rdma *xprt,
goto err;
atomic_inc(>sc_dma_used);
 
-   /* The lkey here is either a local dma lkey or a dma_mr lkey */
-   ctxt->sge[pno].lkey = xprt->sc_dma_lkey;
+   ctxt->sge[pno].lkey = xprt->sc_pd->local_dma_lkey;
ctxt->sge[pno].length = len;
ctxt->count++;
 
diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c 
b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index ced3151..20bd5d4 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -265,7 +265,7 @@ static int send_write(struct svcxprt_rdma *xprt, struct 
svc_rqst *rqstp,
 sge[sge_no].addr))
goto err;
atomic_inc(>sc_dma_used);
-   sge[sge_no].lkey = xprt->sc_dma_lkey;
+   sge[sge_no].lkey = xprt->sc_pd->local_dma_lkey;
ctxt->count++;
sge_off = 0;
sge_no++;
@@ -487,7 +487,7 @@ static int send_reply(struct svcxprt_rdma *rdma,
ctxt->count = 1;
 
/* Prepare the SGE for the RPCRDMA Header */
-   ctxt->sge[0].lkey = rdma->sc_dma_lkey;
+   ctxt->sge[0].lkey = rdma->sc_pd->local_dma_lkey;
ctxt->sge[0].length = svc_rdma_xdr_get_reply_hdr_len(rdma_resp);
ctxt->sge[0].addr =
ib_dma_map_page(rdma->sc_cm_id->device, page, 0,
@@ -511,7 +511,7 @@ static int send_reply(struct svcxprt_rdma *rdma,
 ctxt->sge[sge_no].addr))
goto err;
atomic_inc(>sc_dma_used);
-   ctxt->sge[sge_no].lkey = rdma->sc_dma_lkey;
+   ctxt->sge[sge_no].lkey = rdma->sc_pd->local_dma_lkey;
ctxt->sge[sge_no].length = sge_bytes;
}
if (byte_count != 0) {
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c 
b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index

Re: [PATCH 10/11] IB: only keep a single key in struct ib_mr

2015-12-22 Thread Christoph Hellwig
On Tue, Dec 22, 2015 at 11:17:54AM +0200, Sagi Grimberg wrote:
> What makes me worried here is that the IB/RoCE specification really
> defines different keys for local and remote access. I'm less concerned
> about our consumers but more about our providers. We keep seeing new
> providers come along and its not impossible that a specific HW will
> *rely* on this distinction. In such a case we'd need to revert this
> patch altogether in that very moment.
>
> I think we're better off working on proper abstractions to help ULPs
> get it right (and simple!), without risking future devices support.

With the new API in the next patch ULPs simply can't request an lkey
and a rkey at the same time, so for kernel use it's not a problmblem at
all.  That leaves my favourite nightmare: uverbs, which of course
allows for everything under the sun, just because we can.  I guess
the right answer to that problem is to first split the data structures
for kernel and user MRs, which we probably should have done much
earlier.  Not just because of this but also because of other issues
like all the fields your FR API changs added to ib_mr that aren't needed
for user MRs, or becaue the user MR structure should reallbe be merged with
struct ib_umem.

>
> Sagi.
---end quoted text---
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/11] IB: only keep a single key in struct ib_mr

2015-12-22 Thread Christoph Hellwig
On Tue, Dec 22, 2015 at 03:50:12PM +0200, Sagi Grimberg wrote:
> This is why I said that the problem here is not the ULPs. But if a new
> HW comes along with distinction between rkeys and lkeys it will have a
> problem. For example a HW allocates two different keys, rkey and lkey.
> And, it chooses to fail SEND from a rkey, or incoming READ/WRITE to a
> lkey. How can such a device be supported with an API that allows a
> single key per MR?

The ULP decides if this MR is going to be used as a lkey or rkey
by passing IB_REG_LKEY or IB_REG_RKEY.  The HCA driver will then
fill mr->key by the lkey or rkey based on that and everything will
work fine.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V1 1/3] IB/core: Align coding style of ib_device_cap_flags structure

2015-12-21 Thread Christoph Hellwig
On Mon, Dec 21, 2015 at 11:36:03AM -0500, ira.weiny wrote:
> It would be nice if we were not having to do this for staging then.  Also
> perhaps it should be removed from checkpatch --strict?

Don't use checkpatch --strict ever.  It's full of weird items that
defintively don't apply to the majority of the kernel code base.

> Where are the guidelines for when one can ignore checkpatch and when they can
> not?  It would be nice to know when we can "be developers" vs "being robots to
> some tool".

I think checkpatch is generally useful, and the errors without
--strict are something we I haven't found any false positives.

The warnings are about 90% useful but something are just weird.  For
--strict all bets are off.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V1 1/3] IB/core: Align coding style of ib_device_cap_flags structure

2015-12-21 Thread Christoph Hellwig
On Mon, Dec 21, 2015 at 08:37:26AM +0200, Leon Romanovsky wrote:
> You are right and it is a preferred way for me too, however the
> downside of such change will be one of two:
> 1. Change this structure only => we will have style mix of BITs and
> shifts in the same file. IMHO it looks awful.
> 2. Change the whole file => the work with "git blame" will be less
> straightforward.

Honestly, the BIT macros are horribly, and anyone who thinks it's useful
really should read a book on computer architectured and one on C.

Also the capabilities are used by userspace, so they will need to move
to a uapi heder sooner or later, where this stupid macro isn't even
available.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/10] IB: remove the struct ib_phys_buf definition

2015-12-20 Thread Christoph Hellwig
On Sun, Dec 20, 2015 at 09:37:38AM +0200, Or Gerlitz wrote:
> On 12/18/2015 3:55 PM, Christoph Hellwig wrote:
>
>> Signed-off-by: Christoph Hellwig <h...@lst.de>
>> Reviewed-by: Sagi Grimberg <sa...@mellanox.com>
>> Reviewed-by: Jason Gunthorpe <jguntho...@obsidianresearch.com> [core]
>> Reviewed-by: Steve Wise <sw...@opengridcomputing.com>
>
> Here, too, please avoid empty change logs to IB core patches.

You sound like a bot :)

While I tend to ask people for better description a lot when I don't
understand a patch without them I really don't see what you might want
here.  In general it really helps  to say what you want to see.  I can't
see a useful explanation for removing an unused structure.  But if you
really want one and come up with a coherent sentence or two I can add it.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/10] IB: remove the unused usecnt field from struct ib_mr

2015-12-20 Thread Christoph Hellwig
On Fri, Dec 18, 2015 at 03:14:08PM +0100, Bart Van Assche wrote:
> On 12/18/2015 02:55 PM, Christoph Hellwig wrote:
>> Signed-off-by: Christoph Hellwig <h...@lst.de>
>
> Shouldn't the description of this patch be changed into something like 
> "Remove the usecnt field from ib_mr since it is always zero" ?

In software context unused for my also includes something that's
only written to.  But I can change it to something like:

IB: remove the write only usecnt field from struct ib_mr

to be a little more precise.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/10] IB: remove support for phys MRs

2015-12-18 Thread Christoph Hellwig
We have stopped using phys MRs in the kernel a while ago, so let's
remove all the cruft used to implement them.

Signed-off-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Sagi Grimberg <sa...@mellanox.com>
Reviewed-by: Jason Gunthorpe <jguntho...@obsidianresearch.com> [core]
Reviewed-By: Devesh Sharma<devesh.sha...@avagotech.com> [ocrdma]
Reviewed-by: Steve Wise <sw...@opengridcomputing.com>
---
 drivers/infiniband/hw/cxgb3/iwch_mem.c   |  31 ---
 drivers/infiniband/hw/cxgb3/iwch_provider.c  |  69 --
 drivers/infiniband/hw/cxgb3/iwch_provider.h  |   4 -
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h   |  11 -
 drivers/infiniband/hw/cxgb4/mem.c| 248 -
 drivers/infiniband/hw/cxgb4/provider.c   |   2 -
 drivers/infiniband/hw/mthca/mthca_provider.c |  84 ---
 drivers/infiniband/hw/nes/nes_cm.c   |   7 +-
 drivers/infiniband/hw/nes/nes_verbs.c|   3 +-
 drivers/infiniband/hw/nes/nes_verbs.h|   5 +
 drivers/infiniband/hw/ocrdma/ocrdma_main.c   |   1 -
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c  | 163 --
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.h  |   3 -
 drivers/infiniband/hw/qib/qib_mr.c   |  51 +
 drivers/infiniband/hw/qib/qib_verbs.c|   1 -
 drivers/infiniband/hw/qib/qib_verbs.h|   4 -
 drivers/staging/rdma/amso1100/c2_provider.c  |   1 -
 drivers/staging/rdma/ehca/ehca_iverbs.h  |  11 -
 drivers/staging/rdma/ehca/ehca_main.c|   2 -
 drivers/staging/rdma/ehca/ehca_mrmw.c| 321 ---
 drivers/staging/rdma/ehca/ehca_mrmw.h|   5 -
 drivers/staging/rdma/hfi1/mr.c   |  51 +
 drivers/staging/rdma/hfi1/verbs.c|   1 -
 drivers/staging/rdma/hfi1/verbs.h|   4 -
 drivers/staging/rdma/ipath/ipath_mr.c|  55 -
 drivers/staging/rdma/ipath/ipath_verbs.c |   1 -
 drivers/staging/rdma/ipath/ipath_verbs.h |   4 -
 include/rdma/ib_verbs.h  |  16 +-
 28 files changed, 15 insertions(+), 1144 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch_mem.c 
b/drivers/infiniband/hw/cxgb3/iwch_mem.c
index 5c36ee2..3a5e27d 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_mem.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_mem.c
@@ -75,37 +75,6 @@ int iwch_register_mem(struct iwch_dev *rhp, struct iwch_pd 
*php,
return ret;
 }
 
-int iwch_reregister_mem(struct iwch_dev *rhp, struct iwch_pd *php,
-   struct iwch_mr *mhp,
-   int shift,
-   int npages)
-{
-   u32 stag;
-   int ret;
-
-   /* We could support this... */
-   if (npages > mhp->attr.pbl_size)
-   return -ENOMEM;
-
-   stag = mhp->attr.stag;
-   if (cxio_reregister_phys_mem(>rdev,
-  , mhp->attr.pdid,
-  mhp->attr.perms,
-  mhp->attr.zbva,
-  mhp->attr.va_fbo,
-  mhp->attr.len,
-  shift - 12,
-  mhp->attr.pbl_size, mhp->attr.pbl_addr))
-   return -ENOMEM;
-
-   ret = iwch_finish_mem_reg(mhp, stag);
-   if (ret)
-   cxio_dereg_mem(>rdev, mhp->attr.stag, mhp->attr.pbl_size,
-  mhp->attr.pbl_addr);
-
-   return ret;
-}
-
 int iwch_alloc_pbl(struct iwch_mr *mhp, int npages)
 {
mhp->attr.pbl_addr = cxio_hal_pblpool_alloc(>rhp->rdev,
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c 
b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index 1567b5b..9576e15 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -556,73 +556,6 @@ err:
 
 }
 
-static int iwch_reregister_phys_mem(struct ib_mr *mr,
-int mr_rereg_mask,
-struct ib_pd *pd,
-struct ib_phys_buf *buffer_list,
-int num_phys_buf,
-int acc, u64 * iova_start)
-{
-
-   struct iwch_mr mh, *mhp;
-   struct iwch_pd *php;
-   struct iwch_dev *rhp;
-   __be64 *page_list = NULL;
-   int shift = 0;
-   u64 total_size;
-   int npages = 0;
-   int ret;
-
-   PDBG("%s ib_mr %p ib_pd %p\n", __func__, mr, pd);
-
-   /* There can be no memory windows */
-   if (atomic_read(>usecnt))
-   return -EINVAL;
-
-   mhp = to_iwch_mr(mr);
-   rhp = mhp->rhp;
-   php = to_iwch_pd(mr->pd);
-
-   /* make sure we are on the same adapter */
-   if (rhp != php->rhp)
-   return -EINVAL;
-
-   memcpy(, mhp, sizeof *mhp);
-
-   if (mr_rereg_m

[PATCH 04/10] IB: remove in-kernel support for memory windows

2015-12-18 Thread Christoph Hellwig
Remove the unused ib_allow_mw and ib_bind_mw functions, remove the
unused IB_WR_BIND_MW and IB_WC_BIND_MW opcodes and move ib_dealloc_mw
into the uverbs module.

Signed-off-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Sagi Grimberg <sa...@mellanox.com>
Reviewed-by: Jason Gunthorpe <jguntho...@obsidianresearch.com> [core]
Reviewed-by: Steve Wise <sw...@opengridcomputing.com>
---
 Documentation/infiniband/core_locking.txt   |  2 -
 drivers/infiniband/core/uverbs.h|  2 +
 drivers/infiniband/core/uverbs_cmd.c|  4 +-
 drivers/infiniband/core/uverbs_main.c   | 13 -
 drivers/infiniband/core/verbs.c | 36 -
 drivers/infiniband/hw/cxgb3/iwch_cq.c   |  4 --
 drivers/infiniband/hw/cxgb3/iwch_provider.c |  1 -
 drivers/infiniband/hw/cxgb3/iwch_provider.h |  3 --
 drivers/infiniband/hw/cxgb3/iwch_qp.c   | 82 
 drivers/infiniband/hw/cxgb4/cq.c|  3 --
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h  |  2 -
 drivers/infiniband/hw/cxgb4/provider.c  |  1 -
 drivers/infiniband/hw/cxgb4/qp.c|  5 --
 drivers/infiniband/hw/mlx4/cq.c |  3 --
 drivers/infiniband/hw/mlx4/main.c   |  1 -
 drivers/infiniband/hw/mlx4/mlx4_ib.h|  2 -
 drivers/infiniband/hw/mlx4/mr.c | 22 
 drivers/infiniband/hw/mlx4/qp.c | 27 --
 drivers/infiniband/hw/mlx5/cq.c |  3 --
 drivers/infiniband/hw/mthca/mthca_cq.c  |  3 --
 drivers/infiniband/hw/nes/nes_verbs.c   | 75 --
 drivers/staging/rdma/amso1100/c2_cq.c   |  3 --
 drivers/staging/rdma/ehca/ehca_iverbs.h |  3 --
 drivers/staging/rdma/ehca/ehca_main.c   |  1 -
 drivers/staging/rdma/ehca/ehca_mrmw.c   | 12 -
 drivers/staging/rdma/ehca/ehca_reqs.c   |  1 -
 include/rdma/ib_verbs.h | 83 -
 27 files changed, 16 insertions(+), 381 deletions(-)

diff --git a/Documentation/infiniband/core_locking.txt 
b/Documentation/infiniband/core_locking.txt
index e167854..4b1f36b 100644
--- a/Documentation/infiniband/core_locking.txt
+++ b/Documentation/infiniband/core_locking.txt
@@ -15,7 +15,6 @@ Sleeping and interrupt context
 modify_ah
 query_ah
 destroy_ah
-bind_mw
 post_send
 post_recv
 poll_cq
@@ -31,7 +30,6 @@ Sleeping and interrupt context
 ib_modify_ah
 ib_query_ah
 ib_destroy_ah
-ib_bind_mw
 ib_post_send
 ib_post_recv
 ib_req_notify_cq
diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index 94bbd8c..612ccfd 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -204,6 +204,8 @@ void ib_uverbs_event_handler(struct ib_event_handler 
*handler,
 struct ib_event *event);
 void ib_uverbs_dealloc_xrcd(struct ib_uverbs_device *dev, struct ib_xrcd 
*xrcd);
 
+int uverbs_dealloc_mw(struct ib_mw *mw);
+
 struct ib_uverbs_flow_spec {
union {
union {
diff --git a/drivers/infiniband/core/uverbs_cmd.c 
b/drivers/infiniband/core/uverbs_cmd.c
index 1add536..48776bb 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -1242,7 +1242,7 @@ err_copy:
idr_remove_uobj(_uverbs_mw_idr, uobj);
 
 err_unalloc:
-   ib_dealloc_mw(mw);
+   uverbs_dealloc_mw(mw);
 
 err_put:
put_pd_read(pd);
@@ -1271,7 +1271,7 @@ ssize_t ib_uverbs_dealloc_mw(struct ib_uverbs_file *file,
 
mw = uobj->object;
 
-   ret = ib_dealloc_mw(mw);
+   ret = uverbs_dealloc_mw(mw);
if (!ret)
uobj->live = 0;
 
diff --git a/drivers/infiniband/core/uverbs_main.c 
b/drivers/infiniband/core/uverbs_main.c
index e3ef288..39680ae 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -133,6 +133,17 @@ static int (*uverbs_ex_cmd_table[])(struct ib_uverbs_file 
*file,
 static void ib_uverbs_add_one(struct ib_device *device);
 static void ib_uverbs_remove_one(struct ib_device *device, void *client_data);
 
+int uverbs_dealloc_mw(struct ib_mw *mw)
+{
+   struct ib_pd *pd = mw->pd;
+   int ret;
+
+   ret = mw->device->dealloc_mw(mw);
+   if (!ret)
+   atomic_dec(>usecnt);
+   return ret;
+}
+
 static void ib_uverbs_release_dev(struct kobject *kobj)
 {
struct ib_uverbs_device *dev =
@@ -224,7 +235,7 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file 
*file,
struct ib_mw *mw = uobj->object;
 
idr_remove_uobj(_uverbs_mw_idr, uobj);
-   ib_dealloc_mw(mw);
+   uverbs_dealloc_mw(mw);
kfree(uobj);
}
 
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 29a3e53..2858b35 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1267,42 +1267,6 

[PATCH 06/10] nes: simplify nes_reg_phys_mr calling conventions

2015-12-18 Thread Christoph Hellwig
Just pass and address/size pair instead of an ib_phys_buf array.

Signed-off-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Sagi Grimberg <sa...@mellanox.com>
Reviewed-by: Jason Gunthorpe <jguntho...@obsidianresearch.com> [core]
Reviewed-by: Steve Wise <sw...@opengridcomputing.com>
---
 drivers/infiniband/hw/nes/nes_cm.c|  10 +--
 drivers/infiniband/hw/nes/nes_verbs.c | 140 --
 drivers/infiniband/hw/nes/nes_verbs.h |   3 +-
 3 files changed, 37 insertions(+), 116 deletions(-)

diff --git a/drivers/infiniband/hw/nes/nes_cm.c 
b/drivers/infiniband/hw/nes/nes_cm.c
index 242c87d..bc37adb 100644
--- a/drivers/infiniband/hw/nes/nes_cm.c
+++ b/drivers/infiniband/hw/nes/nes_cm.c
@@ -3232,7 +3232,6 @@ int nes_accept(struct iw_cm_id *cm_id, struct 
iw_cm_conn_param *conn_param)
int passive_state;
struct nes_ib_device *nesibdev;
struct ib_mr *ibmr = NULL;
-   struct ib_phys_buf ibphysbuf;
struct nes_pd *nespd;
u64 tagged_offset;
u8 mpa_frame_offset = 0;
@@ -3316,12 +3315,11 @@ int nes_accept(struct iw_cm_id *cm_id, struct 
iw_cm_conn_param *conn_param)
u64temp = (unsigned long)nesqp;
nesibdev = nesvnic->nesibdev;
nespd = nesqp->nespd;
-   ibphysbuf.addr = nesqp->ietf_frame_pbase + mpa_frame_offset;
-   ibphysbuf.size = buff_len;
tagged_offset = (u64)(unsigned long)*start_buff;
-   ibmr = nes_reg_phys_mr(>ibpd, , 1,
-   IB_ACCESS_LOCAL_WRITE,
-   _offset);
+   ibmr = nes_reg_phys_mr(>ibpd,
+   nesqp->ietf_frame_pbase + mpa_frame_offset,
+   buff_len, IB_ACCESS_LOCAL_WRITE,
+   _offset);
if (!ibmr) {
nes_debug(NES_DBG_CM, "Unable to register memory region"
  "for lSMM for cm_node = %p \n",
diff --git a/drivers/infiniband/hw/nes/nes_verbs.c 
b/drivers/infiniband/hw/nes/nes_verbs.c
index 640f68f..4e57bf0 100644
--- a/drivers/infiniband/hw/nes/nes_verbs.c
+++ b/drivers/infiniband/hw/nes/nes_verbs.c
@@ -1945,9 +1945,8 @@ static int nes_reg_mr(struct nes_device *nesdev, struct 
nes_pd *nespd,
 /**
  * nes_reg_phys_mr
  */
-struct ib_mr *nes_reg_phys_mr(struct ib_pd *ib_pd,
-   struct ib_phys_buf *buffer_list, int num_phys_buf, int acc,
-   u64 * iova_start)
+struct ib_mr *nes_reg_phys_mr(struct ib_pd *ib_pd, u64 addr, u64 size,
+   int acc, u64 *iova_start)
 {
u64 region_length;
struct nes_pd *nespd = to_nespd(ib_pd);
@@ -1959,13 +1958,10 @@ struct ib_mr *nes_reg_phys_mr(struct ib_pd *ib_pd,
struct nes_vpbl vpbl;
struct nes_root_vpbl root_vpbl;
u32 stag;
-   u32 i;
unsigned long mask;
u32 stag_index = 0;
u32 next_stag_index = 0;
u32 driver_key = 0;
-   u32 root_pbl_index = 0;
-   u32 cur_pbl_index = 0;
int err = 0;
int ret = 0;
u16 pbl_count = 0;
@@ -1984,11 +1980,8 @@ struct ib_mr *nes_reg_phys_mr(struct ib_pd *ib_pd,
 
next_stag_index >>= 8;
next_stag_index %= nesadapter->max_mr;
-   if (num_phys_buf > (1024*512)) {
-   return ERR_PTR(-E2BIG);
-   }
 
-   if ((buffer_list[0].addr ^ *iova_start) & ~PAGE_MASK)
+   if ((addr ^ *iova_start) & ~PAGE_MASK)
return ERR_PTR(-EINVAL);
 
err = nes_alloc_resource(nesadapter, nesadapter->allocated_mrs, 
nesadapter->max_mr,
@@ -2003,84 +1996,33 @@ struct ib_mr *nes_reg_phys_mr(struct ib_pd *ib_pd,
return ERR_PTR(-ENOMEM);
}
 
-   for (i = 0; i < num_phys_buf; i++) {
+   /* Allocate a 4K buffer for the PBL */
+   vpbl.pbl_vbase = pci_alloc_consistent(nesdev->pcidev, 4096,
+   _pbase);
+   nes_debug(NES_DBG_MR, "Allocating leaf PBL, va = %p, pa = 0x%016lX\n",
+   vpbl.pbl_vbase, (unsigned long)vpbl.pbl_pbase);
+   if (!vpbl.pbl_vbase) {
+   nes_free_resource(nesadapter, nesadapter->allocated_mrs, 
stag_index);
+   ibmr = ERR_PTR(-ENOMEM);
+   kfree(nesmr);
+   goto reg_phys_err;
+   }
 
-   if ((i & 0x01FF) == 0) {
-   if (root_pbl_index == 1) {
-   /* Allocate the root PBL */
-   root_vpbl.pbl_vbase = 
pci_alloc_consistent(nesdev->pcidev, 8192,
-   _vpbl.pbl_pbase);
-   nes_debug(NES_DBG_MR, "Allocating root PBL, va 
= %p, pa = 0x%08X\n",
-   root_vpbl.pbl_vbase, (unsigned 
int)root_vpbl.pbl_pba

[PATCH 05/10] cxgb3: simplify iwch_get_dma_wr

2015-12-18 Thread Christoph Hellwig
Fold simplified versions of build_phys_page_list and
iwch_register_phys_mem into iwch_get_dma_wr now that no other callers
are left.

Signed-off-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Sagi Grimberg <sa...@mellanox.com>
Reviewed-by: Jason Gunthorpe <jguntho...@obsidianresearch.com> [core]
Reviewed-by: Steve Wise <sw...@opengridcomputing.com>
---
 drivers/infiniband/hw/cxgb3/iwch_mem.c  | 71 
 drivers/infiniband/hw/cxgb3/iwch_provider.c | 73 ++---
 drivers/infiniband/hw/cxgb3/iwch_provider.h |  8 
 3 files changed, 26 insertions(+), 126 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch_mem.c 
b/drivers/infiniband/hw/cxgb3/iwch_mem.c
index 3a5e27d..1d04c87 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_mem.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_mem.c
@@ -99,74 +99,3 @@ int iwch_write_pbl(struct iwch_mr *mhp, __be64 *pages, int 
npages, int offset)
return cxio_write_pbl(>rhp->rdev, pages,
  mhp->attr.pbl_addr + (offset << 3), npages);
 }
-
-int build_phys_page_list(struct ib_phys_buf *buffer_list,
-   int num_phys_buf,
-   u64 *iova_start,
-   u64 *total_size,
-   int *npages,
-   int *shift,
-   __be64 **page_list)
-{
-   u64 mask;
-   int i, j, n;
-
-   mask = 0;
-   *total_size = 0;
-   for (i = 0; i < num_phys_buf; ++i) {
-   if (i != 0 && buffer_list[i].addr & ~PAGE_MASK)
-   return -EINVAL;
-   if (i != 0 && i != num_phys_buf - 1 &&
-   (buffer_list[i].size & ~PAGE_MASK))
-   return -EINVAL;
-   *total_size += buffer_list[i].size;
-   if (i > 0)
-   mask |= buffer_list[i].addr;
-   else
-   mask |= buffer_list[i].addr & PAGE_MASK;
-   if (i != num_phys_buf - 1)
-   mask |= buffer_list[i].addr + buffer_list[i].size;
-   else
-   mask |= (buffer_list[i].addr + buffer_list[i].size +
-   PAGE_SIZE - 1) & PAGE_MASK;
-   }
-
-   if (*total_size > 0xULL)
-   return -ENOMEM;
-
-   /* Find largest page shift we can use to cover buffers */
-   for (*shift = PAGE_SHIFT; *shift < 27; ++(*shift))
-   if ((1ULL << *shift) & mask)
-   break;
-
-   buffer_list[0].size += buffer_list[0].addr & ((1ULL << *shift) - 1);
-   buffer_list[0].addr &= ~0ull << *shift;
-
-   *npages = 0;
-   for (i = 0; i < num_phys_buf; ++i)
-   *npages += (buffer_list[i].size +
-   (1ULL << *shift) - 1) >> *shift;
-
-   if (!*npages)
-   return -EINVAL;
-
-   *page_list = kmalloc(sizeof(u64) * *npages, GFP_KERNEL);
-   if (!*page_list)
-   return -ENOMEM;
-
-   n = 0;
-   for (i = 0; i < num_phys_buf; ++i)
-   for (j = 0;
-j < (buffer_list[i].size + (1ULL << *shift) - 1) >> *shift;
-++j)
-   (*page_list)[n++] = cpu_to_be64(buffer_list[i].addr +
-   ((u64) j << *shift));
-
-   PDBG("%s va 0x%llx mask 0x%llx shift %d len %lld pbl_size %d\n",
-__func__, (unsigned long long) *iova_start,
-(unsigned long long) mask, *shift, (unsigned long long) 
*total_size,
-*npages);
-
-   return 0;
-
-}
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c 
b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index b184933..097eb93 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -479,24 +479,25 @@ static int iwch_dereg_mr(struct ib_mr *ib_mr)
return 0;
 }
 
-static struct ib_mr *iwch_register_phys_mem(struct ib_pd *pd,
-   struct ib_phys_buf *buffer_list,
-   int num_phys_buf,
-   int acc,
-   u64 *iova_start)
+static struct ib_mr *iwch_get_dma_mr(struct ib_pd *pd, int acc)
 {
-   __be64 *page_list;
-   int shift;
-   u64 total_size;
-   int npages;
-   struct iwch_dev *rhp;
-   struct iwch_pd *php;
+   const u64 total_size = 0x;
+   const u64 mask = (total_size + PAGE_SIZE - 1) & PAGE_MASK;
+   struct iwch_pd *php = to_iwch_pd(pd);
+   struct iwch_dev *rhp = php->rhp;
struct iwch_mr *mhp;
-   int ret;
+   __be64 *page_list;
+  

[PATCH 01/10] IB: start documenting device capabilities

2015-12-18 Thread Christoph Hellwig
Just IB_DEVICE_LOCAL_DMA_LKEY and IB_DEVICE_MEM_MGT_EXTENSIONS for now
as I'm most familar with those.

Signed-off-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Sagi Grimberg <sa...@mellanox.com>
Reviewed-By: Jason Gunthorpe <jguntho...@obsidianresearch.com>
---
 include/rdma/ib_verbs.h | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index a7dbbfc..db4e3fe 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -122,6 +122,14 @@ enum ib_device_cap_flags {
IB_DEVICE_RC_RNR_NAK_GEN= (1<<12),
IB_DEVICE_SRQ_RESIZE= (1<<13),
IB_DEVICE_N_NOTIFY_CQ   = (1<<14),
+
+   /*
+* This device supports a per-device lkey or stag that can be
+* used without performing a memory registration for the local
+* memory.  Note that ULPs should never check this flag, but
+* instead of use the local_dma_lkey flag in the ib_pd structure,
+* which will always contain a usable lkey.
+*/
IB_DEVICE_LOCAL_DMA_LKEY= (1<<15),
IB_DEVICE_RESERVED  = (1<<16), /* old SEND_W_INV */
IB_DEVICE_MEM_WINDOW= (1<<17),
@@ -135,6 +143,16 @@ enum ib_device_cap_flags {
IB_DEVICE_UD_IP_CSUM= (1<<18),
IB_DEVICE_UD_TSO= (1<<19),
IB_DEVICE_XRC   = (1<<20),
+
+   /*
+* This device supports the IB "base memory management extension",
+* which includes support for fast registrations (IB_WR_REG_MR,
+* IB_WR_LOCAL_INV and IB_WR_SEND_WITH_INV verbs).  This flag should
+* also be set by any iWarp device which must support FRs to comply
+* to the iWarp verbs spec.  iWarp devices also support the
+* IB_WR_RDMA_READ_WITH_INV verb for RDMA READs that invalidate the
+* stag.
+*/
IB_DEVICE_MEM_MGT_EXTENSIONS= (1<<21),
IB_DEVICE_BLOCK_MULTICAST_LOOPBACK = (1<<22),
IB_DEVICE_MEM_WINDOW_TYPE_2A= (1<<23),
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/10] amso1100: fold c2_reg_phys_mr into c2_get_dma_mr

2015-12-18 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Sagi Grimberg <sa...@mellanox.com>
Reviewed-by: Jason Gunthorpe <jguntho...@obsidianresearch.com> [core]
Reviewed-by: Steve Wise <sw...@opengridcomputing.com>
---
 drivers/staging/rdma/amso1100/c2_provider.c | 71 ++---
 1 file changed, 14 insertions(+), 57 deletions(-)

diff --git a/drivers/staging/rdma/amso1100/c2_provider.c 
b/drivers/staging/rdma/amso1100/c2_provider.c
index 6c3e9cc..1691fc7 100644
--- a/drivers/staging/rdma/amso1100/c2_provider.c
+++ b/drivers/staging/rdma/amso1100/c2_provider.c
@@ -323,43 +323,21 @@ static inline u32 c2_convert_access(int acc)
C2_ACF_LOCAL_READ | C2_ACF_WINDOW_BIND;
 }
 
-static struct ib_mr *c2_reg_phys_mr(struct ib_pd *ib_pd,
-   struct ib_phys_buf *buffer_list,
-   int num_phys_buf, int acc, u64 * iova_start)
+static struct ib_mr *c2_get_dma_mr(struct ib_pd *pd, int acc)
 {
struct c2_mr *mr;
u64 *page_list;
-   u32 total_len;
-   int err, i, j, k, page_shift, pbl_depth;
+   const u32 total_len = 0x;   /* AMSO1100 limit */
+   int err, page_shift, pbl_depth, i;
+   u64 kva = 0;
 
-   pbl_depth = 0;
-   total_len = 0;
+   pr_debug("%s:%u\n", __func__, __LINE__);
 
-   page_shift = PAGE_SHIFT;
/*
-* If there is only 1 buffer we assume this could
-* be a map of all phy mem...use a 32k page_shift.
+* This is a map of all phy mem...use a 32k page_shift.
 */
-   if (num_phys_buf == 1)
-   page_shift += 3;
-
-   for (i = 0; i < num_phys_buf; i++) {
-
-   if (offset_in_page(buffer_list[i].addr)) {
-   pr_debug("Unaligned Memory Buffer: 0x%x\n",
-   (unsigned int) buffer_list[i].addr);
-   return ERR_PTR(-EINVAL);
-   }
-
-   if (!buffer_list[i].size) {
-   pr_debug("Invalid Buffer Size\n");
-   return ERR_PTR(-EINVAL);
-   }
-
-   total_len += buffer_list[i].size;
-   pbl_depth += ALIGN(buffer_list[i].size,
-  BIT(page_shift)) >> page_shift;
-   }
+   page_shift = PAGE_SHIFT + 3;
+   pbl_depth = ALIGN(total_len, BIT(page_shift)) >> page_shift;
 
page_list = vmalloc(sizeof(u64) * pbl_depth);
if (!page_list) {
@@ -368,16 +346,8 @@ static struct ib_mr *c2_reg_phys_mr(struct ib_pd *ib_pd,
return ERR_PTR(-ENOMEM);
}
 
-   for (i = 0, j = 0; i < num_phys_buf; i++) {
-
-   int naddrs;
-
-   naddrs = ALIGN(buffer_list[i].size,
-  BIT(page_shift)) >> page_shift;
-   for (k = 0; k < naddrs; k++)
-   page_list[j++] = (buffer_list[i].addr +
-(k << page_shift));
-   }
+   for (i = 0; i < pbl_depth; i++)
+   page_list[i] = (i << page_shift);
 
mr = kmalloc(sizeof(*mr), GFP_KERNEL);
if (!mr) {
@@ -385,17 +355,17 @@ static struct ib_mr *c2_reg_phys_mr(struct ib_pd *ib_pd,
return ERR_PTR(-ENOMEM);
}
 
-   mr->pd = to_c2pd(ib_pd);
+   mr->pd = to_c2pd(pd);
mr->umem = NULL;
pr_debug("%s - page shift %d, pbl_depth %d, total_len %u, "
"*iova_start %llx, first pa %llx, last pa %llx\n",
__func__, page_shift, pbl_depth, total_len,
-   (unsigned long long) *iova_start,
+   (unsigned long long) kva,
(unsigned long long) page_list[0],
(unsigned long long) page_list[pbl_depth-1]);
-   err = c2_nsmr_register_phys_kern(to_c2dev(ib_pd->device), page_list,
+   err = c2_nsmr_register_phys_kern(to_c2dev(pd->device), page_list,
 BIT(page_shift), pbl_depth,
-total_len, 0, iova_start,
+total_len, 0, ,
 c2_convert_access(acc), mr);
vfree(page_list);
if (err) {
@@ -406,19 +376,6 @@ static struct ib_mr *c2_reg_phys_mr(struct ib_pd *ib_pd,
return >ibmr;
 }
 
-static struct ib_mr *c2_get_dma_mr(struct ib_pd *pd, int acc)
-{
-   struct ib_phys_buf bl;
-   u64 kva = 0;
-
-   pr_debug("%s:%u\n", __func__, __LINE__);
-
-   /* AMSO1100 limit */
-   bl.size = 0x;
-   bl.addr = 0;
-   return c2_reg_phys_mr(pd, , 1, acc, );
-}
-
 static struct ib_mr *c2_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
u64 virt, int acc, struct ib_udata *udata)
 {
-- 
1.9.1

--
To unsub

[PATCH 08/10] ehca: stop using struct ib_phys_buf

2015-12-18 Thread Christoph Hellwig
And simplify the calling convention for full-memory registrations.

Signed-off-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Sagi Grimberg <sa...@mellanox.com>
Reviewed-by: Jason Gunthorpe <jguntho...@obsidianresearch.com> [core]
Reviewed-by: Steve Wise <sw...@opengridcomputing.com>
---
 drivers/staging/rdma/ehca/ehca_classes.h |  5 +-
 drivers/staging/rdma/ehca/ehca_mrmw.c| 94 +++-
 2 files changed, 46 insertions(+), 53 deletions(-)

diff --git a/drivers/staging/rdma/ehca/ehca_classes.h 
b/drivers/staging/rdma/ehca/ehca_classes.h
index bd45e0f..e8c3387 100644
--- a/drivers/staging/rdma/ehca/ehca_classes.h
+++ b/drivers/staging/rdma/ehca/ehca_classes.h
@@ -316,9 +316,8 @@ struct ehca_mr_pginfo {
 
union {
struct { /* type EHCA_MR_PGI_PHYS section */
-   int num_phys_buf;
-   struct ib_phys_buf *phys_buf_array;
-   u64 next_buf;
+   u64 addr;
+   u16 size;
} phy;
struct { /* type EHCA_MR_PGI_USER section */
struct ib_umem *region;
diff --git a/drivers/staging/rdma/ehca/ehca_mrmw.c 
b/drivers/staging/rdma/ehca/ehca_mrmw.c
index c6e3245..1814af7 100644
--- a/drivers/staging/rdma/ehca/ehca_mrmw.c
+++ b/drivers/staging/rdma/ehca/ehca_mrmw.c
@@ -1289,7 +1289,6 @@ int ehca_reg_internal_maxmr(
u64 *iova_start;
u64 size_maxmr;
struct ehca_mr_pginfo pginfo;
-   struct ib_phys_buf ib_pbuf;
u32 num_kpages;
u32 num_hwpages;
u64 hw_pgsize;
@@ -1310,8 +1309,6 @@ int ehca_reg_internal_maxmr(
/* register internal max-MR on HCA */
size_maxmr = ehca_mr_len;
iova_start = (u64 *)ehca_map_vaddr((void *)(KERNELBASE + 
PHYSICAL_START));
-   ib_pbuf.addr = 0;
-   ib_pbuf.size = size_maxmr;
num_kpages = NUM_CHUNKS(((u64)iova_start % PAGE_SIZE) + size_maxmr,
PAGE_SIZE);
hw_pgsize = ehca_get_max_hwpage_size(shca);
@@ -1323,8 +1320,8 @@ int ehca_reg_internal_maxmr(
pginfo.num_kpages = num_kpages;
pginfo.num_hwpages = num_hwpages;
pginfo.hwpage_size = hw_pgsize;
-   pginfo.u.phy.num_phys_buf = 1;
-   pginfo.u.phy.phys_buf_array = _pbuf;
+   pginfo.u.phy.addr = 0;
+   pginfo.u.phy.size = size_maxmr;
 
ret = ehca_reg_mr(shca, e_mr, iova_start, size_maxmr, 0, e_pd,
  , _mr->ib.ib_mr.lkey,
@@ -1620,57 +1617,54 @@ static int ehca_set_pagebuf_phys(struct ehca_mr_pginfo 
*pginfo,
 u32 number, u64 *kpage)
 {
int ret = 0;
-   struct ib_phys_buf *pbuf;
+   u64 addr = pginfo->u.phy.addr;
+   u64 size = pginfo->u.phy.size;
u64 num_hw, offs_hw;
u32 i = 0;
 
-   /* loop over desired phys_buf_array entries */
-   while (i < number) {
-   pbuf   = pginfo->u.phy.phys_buf_array + pginfo->u.phy.next_buf;
-   num_hw  = NUM_CHUNKS((pbuf->addr % pginfo->hwpage_size) +
-pbuf->size, pginfo->hwpage_size);
-   offs_hw = (pbuf->addr & ~(pginfo->hwpage_size - 1)) /
-   pginfo->hwpage_size;
-   while (pginfo->next_hwpage < offs_hw + num_hw) {
-   /* sanity check */
-   if ((pginfo->kpage_cnt >= pginfo->num_kpages) ||
-   (pginfo->hwpage_cnt >= pginfo->num_hwpages)) {
-   ehca_gen_err("kpage_cnt >= num_kpages, "
-"kpage_cnt=%llx num_kpages=%llx "
-"hwpage_cnt=%llx "
-"num_hwpages=%llx i=%x",
-pginfo->kpage_cnt,
-pginfo->num_kpages,
-pginfo->hwpage_cnt,
-pginfo->num_hwpages, i);
-   return -EFAULT;
-   }
-   *kpage = (pbuf->addr & ~(pginfo->hwpage_size - 1)) +
-(pginfo->next_hwpage * pginfo->hwpage_size);
-   if ( !(*kpage) && pbuf->addr ) {
-   ehca_gen_err("pbuf->addr=%llx pbuf->size=%llx "
-"next_hwpage=%llx", pbuf->addr,
-pbuf->size, pginfo->next_hwpage);
-   return -EFAULT;
-   }
-   (pginfo->hwpage_cnt)++;
-   (pginfo-&g

[PATCH 10/10] IB: remove the unused usecnt field from struct ib_mr

2015-12-18 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig <h...@lst.de>
---
 drivers/infiniband/core/uverbs_cmd.c| 6 --
 drivers/infiniband/core/verbs.c | 8 +---
 drivers/infiniband/hw/cxgb3/iwch_provider.c | 3 ---
 drivers/infiniband/hw/cxgb4/mem.c   | 3 ---
 drivers/staging/rdma/ehca/ehca_mrmw.c   | 1 -
 include/rdma/ib_verbs.h | 1 -
 6 files changed, 1 insertion(+), 21 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c 
b/drivers/infiniband/core/uverbs_cmd.c
index 48776bb..57be54e 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -992,7 +992,6 @@ ssize_t ib_uverbs_reg_mr(struct ib_uverbs_file *file,
mr->pd  = pd;
mr->uobject = uobj;
atomic_inc(>usecnt);
-   atomic_set(>usecnt, 0);
 
uobj->object = mr;
ret = idr_add_uobj(_uverbs_mr_idr, uobj);
@@ -1090,11 +1089,6 @@ ssize_t ib_uverbs_rereg_mr(struct ib_uverbs_file *file,
}
}
 
-   if (atomic_read(>usecnt)) {
-   ret = -EBUSY;
-   goto put_uobj_pd;
-   }
-
old_pd = mr->pd;
ret = mr->device->rereg_user_mr(mr, cmd.flags, cmd.start,
cmd.length, cmd.hca_va,
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 2858b35..cce080f 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1209,7 +1209,6 @@ struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int 
mr_access_flags)
mr->pd  = pd;
mr->uobject = NULL;
atomic_inc(>usecnt);
-   atomic_set(>usecnt, 0);
}
 
return mr;
@@ -1218,13 +1217,9 @@ EXPORT_SYMBOL(ib_get_dma_mr);
 
 int ib_dereg_mr(struct ib_mr *mr)
 {
-   struct ib_pd *pd;
+   struct ib_pd *pd = mr->pd;
int ret;
 
-   if (atomic_read(>usecnt))
-   return -EBUSY;
-
-   pd = mr->pd;
ret = mr->device->dereg_mr(mr);
if (!ret)
atomic_dec(>usecnt);
@@ -1260,7 +1255,6 @@ struct ib_mr *ib_alloc_mr(struct ib_pd *pd,
mr->pd  = pd;
mr->uobject = NULL;
atomic_inc(>usecnt);
-   atomic_set(>usecnt, 0);
}
 
return mr;
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c 
b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index 097eb93..f806289 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -458,9 +458,6 @@ static int iwch_dereg_mr(struct ib_mr *ib_mr)
u32 mmid;
 
PDBG("%s ib_mr %p\n", __func__, ib_mr);
-   /* There can be no memory windows */
-   if (atomic_read(_mr->usecnt))
-   return -EINVAL;
 
mhp = to_iwch_mr(ib_mr);
kfree(mhp->pages);
diff --git a/drivers/infiniband/hw/cxgb4/mem.c 
b/drivers/infiniband/hw/cxgb4/mem.c
index 1eb833a..7849890 100644
--- a/drivers/infiniband/hw/cxgb4/mem.c
+++ b/drivers/infiniband/hw/cxgb4/mem.c
@@ -704,9 +704,6 @@ int c4iw_dereg_mr(struct ib_mr *ib_mr)
u32 mmid;
 
PDBG("%s ib_mr %p\n", __func__, ib_mr);
-   /* There can be no memory windows */
-   if (atomic_read(_mr->usecnt))
-   return -EINVAL;
 
mhp = to_c4iw_mr(ib_mr);
rhp = mhp->rhp;
diff --git a/drivers/staging/rdma/ehca/ehca_mrmw.c 
b/drivers/staging/rdma/ehca/ehca_mrmw.c
index 1814af7..06b832b 100644
--- a/drivers/staging/rdma/ehca/ehca_mrmw.c
+++ b/drivers/staging/rdma/ehca/ehca_mrmw.c
@@ -1339,7 +1339,6 @@ int ehca_reg_internal_maxmr(
e_mr->ib.ib_mr.pd = _pd->ib_pd;
e_mr->ib.ib_mr.uobject = NULL;
atomic_inc(&(e_pd->ib_pd.usecnt));
-   atomic_set(&(e_mr->ib.ib_mr.usecnt), 0);
*e_maxmr = e_mr;
return 0;
 
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 284916d..e45776e 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1306,7 +1306,6 @@ struct ib_mr {
u64iova;
u32length;
unsigned int   page_size;
-   atomic_t   usecnt; /* count number of MWs */
 };
 
 struct ib_mw {
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/10] IB: remove the struct ib_phys_buf definition

2015-12-18 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Sagi Grimberg <sa...@mellanox.com>
Reviewed-by: Jason Gunthorpe <jguntho...@obsidianresearch.com> [core]
Reviewed-by: Steve Wise <sw...@opengridcomputing.com>
---
 include/rdma/ib_verbs.h | 5 -
 1 file changed, 5 deletions(-)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index ea093ee..284916d 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1143,11 +1143,6 @@ enum ib_access_flags {
IB_ACCESS_ON_DEMAND = (1<<6),
 };
 
-struct ib_phys_buf {
-   u64  addr;
-   u64  size;
-};
-
 /*
  * XXX: these are apparently used for ->rereg_user_mr, no idea why they
  * are hidden here instead of a uapi header!
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] rds: use local_dma_lkey

2015-12-18 Thread Christoph Hellwig
We now alwasy have a per-PD local_dma_lkey available.  Make use of that
fact in rds and stop registering our own MR.

Signed-off-by: Christoph Hellwig <h...@lst.de>
---
 drivers/staging/rdma/ehca/ehca_mrmw.c |  1 -
 net/rds/iw.c  | 19 +--
 net/rds/iw.h  |  9 -
 net/rds/iw_cm.c   |  6 +-
 net/rds/iw_recv.c |  2 +-
 net/rds/iw_send.c |  7 ---
 6 files changed, 7 insertions(+), 37 deletions(-)

diff --git a/drivers/staging/rdma/ehca/ehca_mrmw.c 
b/drivers/staging/rdma/ehca/ehca_mrmw.c
index 1814af7..06b832b 100644
--- a/drivers/staging/rdma/ehca/ehca_mrmw.c
+++ b/drivers/staging/rdma/ehca/ehca_mrmw.c
@@ -1339,7 +1339,6 @@ int ehca_reg_internal_maxmr(
e_mr->ib.ib_mr.pd = _pd->ib_pd;
e_mr->ib.ib_mr.uobject = NULL;
atomic_inc(&(e_pd->ib_pd.usecnt));
-   atomic_set(&(e_mr->ib.ib_mr.usecnt), 0);
*e_maxmr = e_mr;
return 0;
 
diff --git a/net/rds/iw.c b/net/rds/iw.c
index c74bc8b74..27e9c5e 100644
--- a/net/rds/iw.c
+++ b/net/rds/iw.c
@@ -71,7 +71,6 @@ static void rds_iw_add_one(struct ib_device *device)
 
spin_lock_init(_iwdev->spinlock);
 
-   rds_iwdev->dma_local_lkey = !!(device->device_cap_flags & 
IB_DEVICE_LOCAL_DMA_LKEY);
rds_iwdev->max_wrs = device->max_qp_wr;
rds_iwdev->max_sge = min(device->max_sge, RDS_IW_MAX_SGE);
 
@@ -80,20 +79,10 @@ static void rds_iw_add_one(struct ib_device *device)
if (IS_ERR(rds_iwdev->pd))
goto free_dev;
 
-   if (!rds_iwdev->dma_local_lkey) {
-   rds_iwdev->mr = ib_get_dma_mr(rds_iwdev->pd,
-   IB_ACCESS_REMOTE_READ |
-   IB_ACCESS_REMOTE_WRITE |
-   IB_ACCESS_LOCAL_WRITE);
-   if (IS_ERR(rds_iwdev->mr))
-   goto err_pd;
-   } else
-   rds_iwdev->mr = NULL;
-
rds_iwdev->mr_pool = rds_iw_create_mr_pool(rds_iwdev);
if (IS_ERR(rds_iwdev->mr_pool)) {
rds_iwdev->mr_pool = NULL;
-   goto err_mr;
+   goto err_pd;
}
 
INIT_LIST_HEAD(_iwdev->cm_id_list);
@@ -103,9 +92,6 @@ static void rds_iw_add_one(struct ib_device *device)
ib_set_client_data(device, _iw_client, rds_iwdev);
return;
 
-err_mr:
-   if (rds_iwdev->mr)
-   ib_dereg_mr(rds_iwdev->mr);
 err_pd:
ib_dealloc_pd(rds_iwdev->pd);
 free_dev:
@@ -132,9 +118,6 @@ static void rds_iw_remove_one(struct ib_device *device, 
void *client_data)
if (rds_iwdev->mr_pool)
rds_iw_destroy_mr_pool(rds_iwdev->mr_pool);
 
-   if (rds_iwdev->mr)
-   ib_dereg_mr(rds_iwdev->mr);
-
ib_dealloc_pd(rds_iwdev->pd);
 
list_del(_iwdev->list);
diff --git a/net/rds/iw.h b/net/rds/iw.h
index 5af01d1..b0f6f46 100644
--- a/net/rds/iw.h
+++ b/net/rds/iw.h
@@ -111,7 +111,6 @@ struct rds_iw_connection {
/* alphabet soup, IBTA style */
struct rdma_cm_id   *i_cm_id;
struct ib_pd*i_pd;
-   struct ib_mr*i_mr;
struct ib_cq*i_send_cq;
struct ib_cq*i_recv_cq;
 
@@ -160,7 +159,6 @@ struct rds_iw_connection {
 
/* Protocol version specific information */
unsigned inti_flowctl:1;/* enable/disable flow ctl */
-   unsigned inti_dma_local_lkey:1;
unsigned inti_fastreg_posted:1; /* fastreg posted on this 
connection */
/* Batched completions */
unsigned inti_unsignaled_wrs;
@@ -184,11 +182,9 @@ struct rds_iw_device {
struct list_headconn_list;
struct ib_device*dev;
struct ib_pd*pd;
-   struct ib_mr*mr;
struct rds_iw_mr_pool   *mr_pool;
int max_sge;
unsigned intmax_wrs;
-   unsigned intdma_local_lkey:1;
spinlock_t  spinlock;   /* protect the above */
 };
 
@@ -265,11 +261,6 @@ static inline void rds_iw_dma_sync_sg_for_device(struct 
ib_device *dev,
 }
 #define ib_dma_sync_sg_for_device  rds_iw_dma_sync_sg_for_device
 
-static inline u32 rds_iw_local_dma_lkey(struct rds_iw_connection *ic)
-{
-   return ic->i_dma_local_lkey ? ic->i_cm_id->device->local_dma_lkey : 
ic->i_mr->lkey;
-}
-
 /* ib.c */
 extern struct rds_transport rds_iw_transport;
 extern struct ib_client rds_iw_client;
diff --git a/net/rds/iw_cm.c b/net/rds/iw_cm.c
index aea4c91..78d5f52 100644
--- a/net/rds/iw_cm.c
+++ b/net/rds/iw_cm.c
@@ -269,7 +269,6 @@ static int rds_iw_setup_qp(struct rds_connection *conn)
 
/* Protection domain and mem

[PATCH 02/10] IB: remove ib_query_mr

2015-12-18 Thread Christoph Hellwig
This functionality has no users and was only supported by the staged out
EHCA driver.

Signed-off-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Sagi Grimberg <sa...@mellanox.com>
Reviewed-by: Jason Gunthorpe <jguntho...@obsidianresearch.com> [core]
Reviewed-by: Steve Wise <sw...@opengridcomputing.com>
---
 drivers/infiniband/core/verbs.c |  7 -
 drivers/staging/rdma/ehca/ehca_iverbs.h |  2 --
 drivers/staging/rdma/ehca/ehca_main.c   |  1 -
 drivers/staging/rdma/ehca/ehca_mrmw.c   | 49 -
 include/rdma/ib_verbs.h | 18 
 5 files changed, 77 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 87d3746..29a3e53 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1216,13 +1216,6 @@ struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int 
mr_access_flags)
 }
 EXPORT_SYMBOL(ib_get_dma_mr);
 
-int ib_query_mr(struct ib_mr *mr, struct ib_mr_attr *mr_attr)
-{
-   return mr->device->query_mr ?
-   mr->device->query_mr(mr, mr_attr) : -ENOSYS;
-}
-EXPORT_SYMBOL(ib_query_mr);
-
 int ib_dereg_mr(struct ib_mr *mr)
 {
struct ib_pd *pd;
diff --git a/drivers/staging/rdma/ehca/ehca_iverbs.h 
b/drivers/staging/rdma/ehca/ehca_iverbs.h
index 75c9876..4a45ca3 100644
--- a/drivers/staging/rdma/ehca/ehca_iverbs.h
+++ b/drivers/staging/rdma/ehca/ehca_iverbs.h
@@ -94,8 +94,6 @@ int ehca_rereg_phys_mr(struct ib_mr *mr,
   struct ib_phys_buf *phys_buf_array,
   int num_phys_buf, int mr_access_flags, u64 *iova_start);
 
-int ehca_query_mr(struct ib_mr *mr, struct ib_mr_attr *mr_attr);
-
 int ehca_dereg_mr(struct ib_mr *mr);
 
 struct ib_mw *ehca_alloc_mw(struct ib_pd *pd, enum ib_mw_type type);
diff --git a/drivers/staging/rdma/ehca/ehca_main.c 
b/drivers/staging/rdma/ehca/ehca_main.c
index 285e560..0be7959 100644
--- a/drivers/staging/rdma/ehca/ehca_main.c
+++ b/drivers/staging/rdma/ehca/ehca_main.c
@@ -513,7 +513,6 @@ static int ehca_init_device(struct ehca_shca *shca)
shca->ib_device.get_dma_mr  = ehca_get_dma_mr;
shca->ib_device.reg_phys_mr = ehca_reg_phys_mr;
shca->ib_device.reg_user_mr = ehca_reg_user_mr;
-   shca->ib_device.query_mr= ehca_query_mr;
shca->ib_device.dereg_mr= ehca_dereg_mr;
shca->ib_device.rereg_phys_mr   = ehca_rereg_phys_mr;
shca->ib_device.alloc_mw= ehca_alloc_mw;
diff --git a/drivers/staging/rdma/ehca/ehca_mrmw.c 
b/drivers/staging/rdma/ehca/ehca_mrmw.c
index f914b30..eb274c1 100644
--- a/drivers/staging/rdma/ehca/ehca_mrmw.c
+++ b/drivers/staging/rdma/ehca/ehca_mrmw.c
@@ -589,55 +589,6 @@ rereg_phys_mr_exit0:
return ret;
 } /* end ehca_rereg_phys_mr() */
 
-/*--*/
-
-int ehca_query_mr(struct ib_mr *mr, struct ib_mr_attr *mr_attr)
-{
-   int ret = 0;
-   u64 h_ret;
-   struct ehca_shca *shca =
-   container_of(mr->device, struct ehca_shca, ib_device);
-   struct ehca_mr *e_mr = container_of(mr, struct ehca_mr, ib.ib_mr);
-   unsigned long sl_flags;
-   struct ehca_mr_hipzout_parms hipzout;
-
-   if ((e_mr->flags & EHCA_MR_FLAG_FMR)) {
-   ehca_err(mr->device, "not supported for FMR, mr=%p e_mr=%p "
-"e_mr->flags=%x", mr, e_mr, e_mr->flags);
-   ret = -EINVAL;
-   goto query_mr_exit0;
-   }
-
-   memset(mr_attr, 0, sizeof(struct ib_mr_attr));
-   spin_lock_irqsave(_mr->mrlock, sl_flags);
-
-   h_ret = hipz_h_query_mr(shca->ipz_hca_handle, e_mr, );
-   if (h_ret != H_SUCCESS) {
-   ehca_err(mr->device, "hipz_mr_query failed, h_ret=%lli mr=%p "
-"hca_hndl=%llx mr_hndl=%llx lkey=%x",
-h_ret, mr, shca->ipz_hca_handle.handle,
-e_mr->ipz_mr_handle.handle, mr->lkey);
-   ret = ehca2ib_return_code(h_ret);
-   goto query_mr_exit1;
-   }
-   mr_attr->pd = mr->pd;
-   mr_attr->device_virt_addr = hipzout.vaddr;
-   mr_attr->size = hipzout.len;
-   mr_attr->lkey = hipzout.lkey;
-   mr_attr->rkey = hipzout.rkey;
-   ehca_mrmw_reverse_map_acl(, _attr->mr_access_flags);
-
-query_mr_exit1:
-   spin_unlock_irqrestore(_mr->mrlock, sl_flags);
-query_mr_exit0:
-   if (ret)
-   ehca_err(mr->device, "ret=%i mr=%p mr_attr=%p",
-ret, mr, mr_attr);
-   return ret;
-} /* end ehca_query_mr() */
-
-/*--*/
-
 int ehca_dereg_mr(struct ib_mr *mr)
 {
int ret = 0;
diff --git a/include/rdma/ib_verbs.h 

MR cleanups

2015-12-18 Thread Christoph Hellwig
Hi Doug,

this series (on top of your k.o/for-4.5 branch) has various MR-related
cleanups:  starting to document the device capabilities, removing lots
of dead MR/MW code and removing a useless field in struct ib_mr.  This
should be fairly uncontroversial I hope, so I'd like to get it in before
the real MR changes.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 37/37] IB/rdmavt: Add support for new memory registration API

2015-12-17 Thread Christoph Hellwig
On Thu, Dec 17, 2015 at 10:52:29AM -0500, Dennis Dalessandro wrote:
> I am not opposed to leaving the code in rdmavt. It gets removed from qib in
> the other patch series I posted. My preference is to leave it in rdmavt
> since it will be needed down the road. However I can go either way here, its
> easy to add back later.

Without setting IB_DEVICE_MEM_MGT_EXTENSIONS and implementing all the
features required for it it's dead code.  There is no point to keep
it around.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/15] i40iw: add hw and utils files

2015-12-17 Thread Christoph Hellwig
> +#ifndef UNREFERENCED_PARAMETER
> +#define UNREFERENCED_PARAMETER(_p)   \
> +{\
> + (_p) = (_p);\
> +}
> +#endif

No need for this, just remove it.

> +#define I40E_MASK(mask, shift) (mask << shift)

Please just opencode the shit, this macro is silly.

> +#define i40iw_flush(a)  readl((a)->hw_addr + I40E_GLGEN_STAT)
> +
> +#define wr32(a, reg, value) writel((value), (a)->hw_addr + (reg))
> +#define rd32(a, reg)readl((a)->hw_addr + (reg))

Please urn these into inlines.

> +
> +#ifndef readq
> +static inline u64 rd64(u8 * __iomem addr)
> +{
> + return ((u64)readl(addr)) | (((u64)readl(addr + 4UL)) << 32);
> +}
> +#else
> +#define rd64(a)readq((a))
> +#endif

Please use the magic in  instead.

> +
> +#define db_wr32(a, value)   writel((value), (a))

Pleas remove this pointless wrapper.

> +void SLEEP(u8 ms);

Please give this function a sensible name.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/15] i40iw: add files for iwarp interface

2015-12-17 Thread Christoph Hellwig
> + i40iw_next_iw_state(iwqp, I40IW_QP_STATE_ERROR, 0, 0, 0);
> +
> + if (!iwqp->user_mode) {
> + if (iwqp->iwscq)
> + i40iw_clean_cqes(iwqp, iwqp->iwscq);
> + if ((iwqp->iwrcq) && (iwqp->iwrcq != iwqp->iwscq))

Please try to do a pass over your code and remove all these pointless
braces.

> +static int i40iw_setup_virt_qp(struct i40iw_device *iwdev,
> +struct i40iw_qp *iwqp,
> +struct i40iw_qp_init_info *init_info)
> +{
> + struct i40iw_pbl *iwpbl = iwqp->iwpbl;
> + struct i40iw_qp_mr *qpmr = >qp_mr;
> + u64 *sq_base;
> +
> + sq_base = kmap(qpmr->sq_page);
> + iwqp->sq_kmapped = 1;


You must never use kmap for any long lived resource.  Just allocate
it it out of lowmem so that you don't need the kmap.

> + ukinfo->rq = (u64 *)((u8 *)mem->va + (sqdepth * I40IW_QP_WQE_MIN_SIZE));
> + info->rq_pa = (uintptr_t)((u8 *)mem->pa + (sqdepth * 
> I40IW_QP_WQE_MIN_SIZE));
> +
> + ukinfo->shadow_area = (u64 *)((u8 *)ukinfo->rq +
> +   (rqdepth * I40IW_QP_WQE_MIN_SIZE));
> + info->shadow_area_pa = info->rq_pa + (rqdepth * I40IW_QP_WQE_MIN_SIZE);

Can you please try to get away with less casts here?  Note that Linux does
use GCC extensions for void pointer arithmetics.  Even without that you
never need to use casts to or from void pointers.  All this happes in
lots of places in the code, so a little audit would be useful.

> +/**
> + * i40iw_alloc_mw - Allocate memory window
> + * @ibpd: protection domain
> + * @type: memory window type
> + */
> +static struct ib_mw *i40iw_alloc_mw(struct ib_pd *ibpd,
> + enum ib_mw_type type)
> +{
> + return ERR_PTR(-ENOSYS);
> +}
> +
> +/**
> + * i40iw_dealloc_mw - Free a memory window
> + * @ibmw: memory window to free
> + */
> +static int i40iw_dealloc_mw(struct ib_mw *ibmw)
> +{
> + return -EIO;
> +}
> +
> +/**
> + * i40iw_bind_mw - Bind a memory window to a qp
> + * @ibqp: queue pair
> + * @ibmw: memory window
> + * @ibmw_bind: pointer to bind structure
> + */
> +static int i40iw_bind_mw(struct ib_qp *ibqp,
> +  struct ib_mw *ibmw,
> +  struct ib_mw_bind *ibmw_bind)
> +{
> + return -ENOSYS;
> +}

There shouldn't be any need to stub all these out.

> +/**
> + * i40iw_init_ofa_device - initialization of iwarp device
> + * @iwdev: iwarp device
> + */
> +static struct i40iw_ib_device *i40iw_init_ofa_device(struct i40iw_device 
> *iwdev)

Where is that weird ofa prefix coming from?

> + iwibdev->ibdev.reg_phys_mr = i40iw_reg_phys_mr;

Please don't add phys MR support in new drivers, it's about to
disappear.

> + iwibdev->ibdev.detach_mcast = NULL;
> + iwibdev->ibdev.attach_mcast = NULL;
> + iwibdev->ibdev.get_protocol_stats = i40iw_get_protocol_stats;
> + iwibdev->ibdev.process_mad = NULL;

All the unused fields should already be zeroed.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH rdma-next 0/6] dev attr cleanup (less is more)

2015-12-17 Thread Christoph Hellwig
On Thu, Dec 17, 2015 at 03:18:54PM +0200, Or Gerlitz wrote:
> I used hunks from Christoph's work and mentioned that in the 
> change-logs. This can turn to be his signature, if he wants to.

I heartily disagree with this approach, and I'd prefer if you don't
blame any of this horrible scheme on me.  Please add my:


Nacked-by: Christoph Hellwig <h...@lst.de>

instead.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 15/15] i40iw: changes for build of i40iw module

2015-12-16 Thread Christoph Hellwig
> --- a/include/uapi/rdma/rdma_netlink.h
> +++ b/include/uapi/rdma/rdma_netlink.h
> @@ -5,6 +5,7 @@
>  
>  enum {
>   RDMA_NL_RDMA_CM = 1,
> + RDMA_NL_I40IW,
>   RDMA_NL_NES,
>   RDMA_NL_C4IW,
>   RDMA_NL_LS, /* RDMA Local Services */

This changes the values for the existing RDMA_NL_NES, RDMA_NL_C4IW and
RDMA_NL_LS symbols.  Please add your new value at the end.  And it
should probably be a separate patch as it's not related to the build
system and referenced by the earlier patches.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] svc_rdma: use local_dma_lkey

2015-12-16 Thread Christoph Hellwig
We now alwasy have a per-PD local_dma_lkey available.  Make use of that
fact in svc_rdma and stop registering our own MR.

Signed-off-by: Christoph Hellwig <h...@lst.de>
---
 include/linux/sunrpc/svc_rdma.h|  2 --
 net/sunrpc/xprtrdma/svc_rdma_backchannel.c |  2 +-
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c|  4 ++--
 net/sunrpc/xprtrdma/svc_rdma_sendto.c  |  6 ++---
 net/sunrpc/xprtrdma/svc_rdma_transport.c   | 36 --
 5 files changed, 10 insertions(+), 40 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index b13513a..5322fea 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -156,13 +156,11 @@ struct svcxprt_rdma {
struct ib_qp *sc_qp;
struct ib_cq *sc_rq_cq;
struct ib_cq *sc_sq_cq;
-   struct ib_mr *sc_phys_mr;   /* MR for server memory */
int  (*sc_reader)(struct svcxprt_rdma *,
  struct svc_rqst *,
  struct svc_rdma_op_ctxt *,
  int *, u32 *, u32, u32, u64, bool);
u32  sc_dev_caps;   /* distilled device caps */
-   u32  sc_dma_lkey;   /* local dma key */
unsigned int sc_frmr_pg_list_len;
struct list_head sc_frmr_q;
spinlock_t   sc_frmr_q_lock;
diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c 
b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
index 417cec1..c428734 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
@@ -128,7 +128,7 @@ static int svc_rdma_bc_sendto(struct svcxprt_rdma *rdma,
 
ctxt->wr_op = IB_WR_SEND;
ctxt->direction = DMA_TO_DEVICE;
-   ctxt->sge[0].lkey = rdma->sc_dma_lkey;
+   ctxt->sge[0].lkey = rdma->sc_pd->local_dma_lkey;
ctxt->sge[0].length = sndbuf->len;
ctxt->sge[0].addr =
ib_dma_map_page(rdma->sc_cm_id->device, ctxt->pages[0], 0,
diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c 
b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index 3dfe464..c8b8a8b 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -144,6 +144,7 @@ int rdma_read_chunk_lcl(struct svcxprt_rdma *xprt,
 
head->arg.pages[pg_no] = rqstp->rq_arg.pages[pg_no];
head->arg.page_len += len;
+
head->arg.len += len;
if (!pg_off)
head->count++;
@@ -160,8 +161,7 @@ int rdma_read_chunk_lcl(struct svcxprt_rdma *xprt,
goto err;
atomic_inc(>sc_dma_used);
 
-   /* The lkey here is either a local dma lkey or a dma_mr lkey */
-   ctxt->sge[pno].lkey = xprt->sc_dma_lkey;
+   ctxt->sge[pno].lkey = xprt->sc_pd->local_dma_lkey;
ctxt->sge[pno].length = len;
ctxt->count++;
 
diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c 
b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index ced3151..20bd5d4 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -265,7 +265,7 @@ static int send_write(struct svcxprt_rdma *xprt, struct 
svc_rqst *rqstp,
 sge[sge_no].addr))
goto err;
atomic_inc(>sc_dma_used);
-   sge[sge_no].lkey = xprt->sc_dma_lkey;
+   sge[sge_no].lkey = xprt->sc_pd->local_dma_lkey;
ctxt->count++;
sge_off = 0;
sge_no++;
@@ -487,7 +487,7 @@ static int send_reply(struct svcxprt_rdma *rdma,
ctxt->count = 1;
 
/* Prepare the SGE for the RPCRDMA Header */
-   ctxt->sge[0].lkey = rdma->sc_dma_lkey;
+   ctxt->sge[0].lkey = rdma->sc_pd->local_dma_lkey;
ctxt->sge[0].length = svc_rdma_xdr_get_reply_hdr_len(rdma_resp);
ctxt->sge[0].addr =
ib_dma_map_page(rdma->sc_cm_id->device, page, 0,
@@ -511,7 +511,7 @@ static int send_reply(struct svcxprt_rdma *rdma,
 ctxt->sge[sge_no].addr))
goto err;
atomic_inc(>sc_dma_used);
-   ctxt->sge[sge_no].lkey = rdma->sc_dma_lkey;
+   ctxt->sge[sge_no].lkey = rdma->sc_pd->local_dma_lkey;
ctxt->sge[sge_no].length = sge_bytes;
}
if (byte_count != 0) {
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c 
b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index abfbd02..faf4c49 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -232,11 +232,11 @@ void svc_rdma_unmap_dma(struct svc_r

small svc_rdma cleanup

2015-12-16 Thread Christoph Hellwig
This makes use of the now always available local_dma_lkey, and goes on top
of Chuck's "[PATCH v4 00/11] NFS/RDMA server patches for v4.5" series.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 04/11] xprtrdma: Move struct ib_send_wr off the stack

2015-12-16 Thread Christoph Hellwig
On Wed, Dec 16, 2015 at 10:06:33AM -0500, Chuck Lever wrote:
> > Would it make sense to unionize these as they are guaranteed not to
> > execute together? Some people don't like this sort of savings.
> 
> I dislike unions because they make the code that uses
> them less readable. I can define macros to help that,
> but sigh! OK.

Shouldn't be an issue with transparent unions these days:

union {
struct ib_reg_wrfr_regwr;
struct ib_send_wr   fr_invwr;
};
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 04/11] xprtrdma: Move struct ib_send_wr off the stack

2015-12-16 Thread Christoph Hellwig
On Wed, Dec 16, 2015 at 10:13:31AM -0500, Chuck Lever wrote:
> > Shouldn't be an issue with transparent unions these days:
> > 
> > union {
> > struct ib_reg_wrfr_regwr;
> > struct ib_send_wr   fr_invwr;
> > };
> 
> Right, but isn't that a gcc-ism that Al hates? If
> everyone is OK with that construction, I will use it.

I started out as a GNUism, but now is supported in C11.  We use it
a lot all over the kernel.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/13] IB/srp: use the new CQ API

2015-12-12 Thread Christoph Hellwig
On Fri, Dec 11, 2015 at 12:59:01PM -0500, Doug Ledford wrote:
> On 12/11/2015 09:22 AM, Christoph Hellwig wrote:
> > Hi Bart,
> > 
> > thanks for all the reviews.  I've updated the git branch with your
> > suggestions and reviewed-by tags.  I'm going to wait a little bit
> > longer for other reviews to come in before reposting the series.
> 
> Indeed, thanks for all the catches Bart.  This patchset, with Bart's
> fixups, looks good to me.

Allright.  How do you want to proceed?  The current rdma-cq branch
has all kinds of dependencies, but I've also prepared a new rdma-cq.2
branch that could go straight on top of your current queue:

http://git.infradead.org/users/hch/rdma.git/shortlog/refs/heads/rdma-cq.2

If you're ready to start the 4.5 tree I can send those out as a patch
series.

> 
> 
> -- 
> Doug Ledford <dledf...@redhat.com>
>   GPG KeyID: 0E572FDD
> 
> 


---end quoted text---
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/13] IB/srp: use the new CQ API

2015-12-11 Thread Christoph Hellwig
Hi Bart,

thanks for all the reviews.  I've updated the git branch with your
suggestions and reviewed-by tags.  I'm going to wait a little bit
longer for other reviews to come in before reposting the series.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/13] IB: add a proper completion queue abstraction

2015-12-11 Thread Christoph Hellwig
On Thu, Dec 10, 2015 at 10:42:22AM -0800, Bart Van Assche wrote:
>> +struct ib_cq *ib_alloc_cq(struct ib_device *dev, void *private,
>> +int nr_cqe, int comp_vector, enum ib_poll_context poll_ctx)
>> +{
> > [ ... ]
>> +cq->wc = kmalloc_array(IB_POLL_BATCH, sizeof(*cq->wc), GFP_KERNEL);
>
> Why is the wc array allocated separately instead of being embedded in 
> struct ib_cq ? I think the faster completion queues can be created the 
> better so if it is possible to eliminate the above kmalloc() call I would 
> prefer that.

I originally allocated an embedded aray, but Sagi pointed out that
we'd waste memory for CQs not using the new API, so I changed it.
The embedded one would be quite a bit simpler indeed.

>> --- a/drivers/infiniband/ulp/srp/ib_srp.c
>> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
>> @@ -457,10 +457,11 @@ static struct srp_fr_pool *srp_alloc_fr_pool(struct 
>> srp_target_port *target)
>>   static void srp_destroy_qp(struct srp_rdma_ch *ch)
>>   {
>>  static struct ib_qp_attr attr = { .qp_state = IB_QPS_ERR };
>> -static struct ib_recv_wr wr = { .wr_id = SRP_LAST_WR_ID };
>> +static struct ib_recv_wr wr = { 0 };
>>  struct ib_recv_wr *bad_wr;
>>  int ret;
>
> Is explicit initialization to "{ 0 }" really needed for static structures ?

It shouldn't be needed, but I can't see how it harms either.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/37] IB/rdmavt: Consolidate dma ops in rdmavt.

2015-12-11 Thread Christoph Hellwig
On Thu, Dec 10, 2015 at 11:02:02PM -0700, Jason Gunthorpe wrote:
> > FYI, I have a patch series in linux-next to switches all remaining
> > architectures to use get_dma_ops, and there are plans to allow generic
> > per-device dma_ops based on that.
> 
> Great, so once that is merged we can drop the ib_* versions of all
> this and just have qib/etc customize get_dma_ops? Other than the
> dma_addr_t size issue that sounds great..

I'm not sure the per-device ops are a done deal, as there has been
vocal opposition to it everytime it came up.  But at least we have
the infrastructure for it now.

Other than that I think we're getting ready to actually remove
dma mapping from the ULPs.  Sagi's MR API that takes a scatterlist
is a first step, as it would allow for trivially moving the
dma_map_sg into the core helpes.  For the client side we now just
need to switch FMRs to use that API as well (given that it seems
like we can't get rid of them) and provide an API to "map" a
scatterlist using the DMA MR for those drivers playing fast and
loose.  On the server side I have a first draft of a R/W API that
does RDMA READ/WRITE requests and handles the required registration
and invalidation internally.  It again takes a scatterlist and handles
dma mapping internal.  Now all the dma mapping will be in the core,
which means they are only one step away from the driver.  Now if the
per-device dma_ops don't work out we can simply have a flag in
struct ib_device that it doesn't need dma mapping and can avoid
the indirection through another set of ops at least.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: device attr cleanup (was: Handle mlx4 max_sge_rd correctly)

2015-12-10 Thread Christoph Hellwig
On Thu, Dec 10, 2015 at 12:46:54PM -0500, Doug Ledford wrote:
> Organization.  Let's be fair, the totally flat namespace you are
> preferring is the equivalent of a teenager that is completely incapable
> of picking thier dirty laundry up off the floor.  It is sloppy,
> disorganized, often full of old cruft that you don't know if you can get
> rid of or not, often so disorganized you might have three similarly
> named items that you can't figure out which one should be used in which
> circumstances, etc.

The most cruft I've found in major subsystems in years has been
in the RDMA code, so I'm not sure where that argument comes from.
We're pretty good at garbage collecting cruft in Linux, and the
typical counter examples are arbitrarily split structures where it's
easy to hide things.

> >  And looking at the existing members of
> > struct ib_device what determines if it goes straight into the device
> > or the attribute?
> 
> Organization.  What goes where depends on what makes sense according to
> the organization you are doing.

So what makes num_comp_vectors or phys_port_cnt fit into ib_device,
while max_qp or max_cq are in struct ib_device_attr?

I really like clean data structures, but keeping structures that
have 1:1 relationships and sit in the same module separate never
has been a good idea.

> >  There is a reason why we don't do this weird
> > attr split in other Linux subsystems, and making IB follow this pattern
> > makes everyone feel right at home instead of wondering about the
> > weird attribute.
> 
> Being organized is not "weird".  Let's not wax poetic about sloppy,
> disorganized structures.  Let's be honest about what they are so we
> don't feel like we need to take a shower every time we talk about them
> to purge us of the sins of our lies.

I call that utter BS.  Being organized is exactly not having multiple
structures that have the same scope or life time, it's actually what
I call disorganzied.  There is a lot to be said about grouping the
fields in the structure, and that's how sensible subsystems handle it:

stuct foo_bar {
/* read/write in the hot path, keep together and tightly packed: */
...

/* read-only in the hot path */
...

/* random members: */
...

/* properties here, immutable after setup: */
...
};

but that's completely inverse to what we're having with ib_device
currently.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: device attr cleanup (was: Handle mlx4 max_sge_rd correctly)

2015-12-10 Thread Christoph Hellwig
On Thu, Dec 10, 2015 at 11:07:03AM -0700, Jason Gunthorpe wrote:
> The ARM folks do this sort of stuff on a regular basis.. Very early on
> Doug prepares a topic branch with only the big change, NFS folks pull
> it and then pull your work. Then Doug would send the topic branch to
> Linus as soon as the merge window opens, then NFS would send theirs.
> 
> This is alot less work overall than trying to sequence multiple
> patches over multiple releases..

Agreed.  Staging has alaways been a giant pain and things tend to never
finish moving over that way if they are non-trivial enough.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/37] IB/rdmavt: Consolidate dma ops in rdmavt.

2015-12-10 Thread Christoph Hellwig
On Thu, Dec 10, 2015 at 10:44:13AM -0700, Jason Gunthorpe wrote:
> > the signature of the function in struct ib_dma_mapping_ops.
> 
> It is supposed to be a dma_addr_t 'cookie' not a u64.
> 
> A patch to cleanup the core in this area would be appreciated.

I walked through the ib_dma_* mess in detail, and sadly speaking it
has to be a u64.  This is due to the drivers being consolidated into
rdmavt in fact.

Those drivers use the addr field in struct ib_sge to point to a kernel
virtual address, not to a DMA address.  In Linux u64 is the safe
superset for a dma_addr_t and a pointer so we'll need to go with that.

Now these drivers will end up dma mapping these virtual addresses later,
so we might want figure out a) why the qib & co drivers even need the
virtual address, and b) see if we maybe should always do the dma_map
in the callers anyway, and just have an additional virtual address field
for those drivers if absolutely needed.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/37] IB/rdmavt: Consolidate dma ops in rdmavt.

2015-12-10 Thread Christoph Hellwig
On Thu, Dec 10, 2015 at 05:29:50PM -0700, Jason Gunthorpe wrote:
> Hrm.. sizeof(void *) > sizeof(dma_addr_t) seemed pretty obscure to me,
> here is the original discussion:
> 
> https://lkml.org/lkml/2006/12/13/245
> 
> Sounds like someone was worried about sparc64. I doubt it is an actual
> issue today, but granted the u64 did make some sense.

sparc64 still uses a 32-bit dma_addr_t.

> > Now these drivers will end up dma mapping these virtual addresses later,
> > so we might want figure out a) why the qib & co drivers even need the
> > virtual address, and b) see if we maybe should always do the dma_map
> > in the callers anyway, and just have an additional virtual address field
> > for those drivers if absolutely needed.
> 
> So, I *believe* the issue is that linux has (had?) no approved way to
> convert from a device specific dma_addr_t to a virtual address.

Linux doesn't have an approved way because it's impossible for the
generic case.  When you have an iommu you have potentially multiple
page tables mapping physical addresses to device virtual addresses,
and there is no easy way to do a reverse mapping.

> It is really too bad we can't just use get_dma_ops to handle this case
> and instead require our own infrastructure.

FYI, I have a patch series in linux-next to switches all remaining
architectures to use get_dma_ops, and there are plans to allow generic
per-device dma_ops based on that.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 02/10] IB/iser: Reuse ib_sg_to_pages

2015-12-10 Thread Christoph Hellwig
On Thu, Dec 10, 2015 at 10:28:59AM +0200, Sagi Grimberg wrote:
> We can do that. We'd need to add ib_mr a list member just for fmr
> routines.

We'll also need that for a FR pool abstraction that would be very
helpful.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 26/37] IB/rdmavt: Move memory registration into rdmavt

2015-12-10 Thread Christoph Hellwig
On Thu, Dec 10, 2015 at 10:14:56AM -0500, Dennis Dalessandro wrote:
> Why? Because, it exists in qib and hfi1.
> 
> However, it seems no one is actually using this in the kernel these days and
> the core support was removed in commit 1241d7bf. Yet the function pointer
> still exists in struct ib_device. Are there plans to remove this?

Yes, I've send a patch that remove it, and which only got positive
review so far.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 02/10] IB/iser: Reuse ib_sg_to_pages

2015-12-09 Thread Christoph Hellwig
On Wed, Dec 09, 2015 at 02:12:00PM +0200, Sagi Grimberg wrote:
> We have in iser iser_sg_to_page_vec which has exactly
> the same role as ib_sg_to_pages. Customize the page_vec
> to hold a fake MR so we can reuse ib_sg_to_pages.

Looks good.  In the long run we should simply kill struct ib_fmr
and make FRMs operate on struct ib_mr so that it can use
ib_sg_to_pages directly.

Signed-off-by: Christoph Hellwig <h...@lst.de>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 03/10] IB/iser: Don't register memory for all immediate data writes

2015-12-09 Thread Christoph Hellwig
The iser_reg_rdma_mem calling conventions seem rather confusing,
and this patch doesn't help that.  But I think that's something
to be addressed in bigger cleanup later on.

Reviewed-by: Christoph Hellwig <h...@lst.de>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 04/10] IB/iser: set intuitive values for mr_valid

2015-12-09 Thread Christoph Hellwig
On Wed, Dec 09, 2015 at 02:12:02PM +0200, Sagi Grimberg wrote:
> From: Jenny Derzhavetz <jen...@mellanox.com>
> 
> This parameter is described as "is mr valid indicator".
> In other words, it indicates whether memory registration
> is valid or not. So intuitive values would be:
> mr_valid=True, when memory registration is valid and
> mr_valid=False otherwise.

Might be worth to rename it to 'bool need_invalidate', but otherwise
looks fine:

Reviewed-by: Christoph Hellwig <h...@lst.de>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 05/10] iser: Have initiator and target to share protocol structures and definitions

2015-12-09 Thread Christoph Hellwig
> -/* Constant PDU lengths calculations */
> -#define ISER_HEADERS_LEN  (sizeof(struct iser_hdr) + sizeof(struct 
> iscsi_hdr))
> +/*Constant PDU lengths calculations */

Odd whitespace error.

Otherwise looks fine:

Reviewed-by: Christoph Hellwig <h...@lst.de>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: device attr cleanup (was: Handle mlx4 max_sge_rd correctly)

2015-12-09 Thread Christoph Hellwig
On Wed, Dec 09, 2015 at 01:13:29AM +0200, Or Gerlitz wrote:
> 
> Christoph patch is here indeed, it does two things
> 
> 1. remove all the ULP device attr alloc, device query, attr free hassle
> 2. adds tons of new fields to struct ib_device
> 
> I think it just goes too much and needlessly adds tons of these new
> fields directly to struct ib_device where we can have them all well
> scoped into ib_device_attr member or pointer from struct ib_device

What's the benefit of that?  And looking at the existing members of
struct ib_device what determines if it goes straight into the device
or the attribute?  There is a reason why we don't do this weird
attr split in other Linux subsystems, and making IB follow this pattern
makes everyone feel right at home instead of wondering about the
weird attribute.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC contig pages support 1/2] IB: Supports contiguous memory operations

2015-12-09 Thread Christoph Hellwig
On Wed, Dec 09, 2015 at 10:00:02AM +, Shachar Raindel wrote:
> As far as gain is concerned, we are seeing gains in two cases here:
> 1. If the system has lots of non-fragmented, free memory, you can create 
> large contig blocks that are above the CPU huge page size.
> 2. If the system memory is very fragmented, you cannot allocate huge pages. 
> However, an API that allows you to create small (i.e. 64KB, 128KB, etc.) 
> contig blocks reduces the load on the HW page tables and caches.

None of that is a uniqueue requirement for the mlx4 devices.  Again,
please work with the memory management folks to address your
requirements in a generic way!
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: device attr cleanup (was: Handle mlx4 max_sge_rd correctly)

2015-12-09 Thread Christoph Hellwig
On Tue, Dec 08, 2015 at 07:52:03PM -0500, ira.weiny wrote:
> Searching patchworks...
> 
> I'm a bit worried about the size of the patch and I would like to see it split
> up for review.  But I agree Christophs method is better long term.

I'd be happy to split it up if I could see a way to split it.  So if
anyone has an idea you're welcome!

> Christoph do you have this on github somewhere?  Perhaps it is split but I'm
> not finding in on patchworks?

No need for github, we have much better (and older) git hosting sites :)

http://git.infradead.org/users/hch/rdma.git/shortlog/refs/heads/ib_device_attr
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 07/10] iser-target: Declare correct flags when accepting a connection

2015-12-09 Thread Christoph Hellwig
On Wed, Dec 09, 2015 at 02:12:05PM +0200, Sagi Grimberg wrote:
> From: Jenny Derzhavetz <jen...@mellanox.com>
> 
> iser target does not support zero based virtual addresses and
> send with invalidate, so it should declare that it doesn't.

Only mrginally related, but can someone explain what zero based
virtual addresses means in this context?  Does this means it uses
the old RFC5046-style header without the read/write_va fields?
Or does it mean those fields exist but must always be zero?
I couldn't really find a good answer in Annex A12.

Otherwise looks fine:

Reviewed-by: Christoph Hellwig <h...@lst.de>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 08/10] iser-target: Support the remote invalidation exception

2015-12-09 Thread Christoph Hellwig
> + if (isert_conn->snd_w_inv)
> + isert_info("Using remote invalidation\n");

Isn't this a little bit too chatty?

Otherwise looks good,

Reviewed-by: Christoph Hellwig <h...@lst.de>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 09/10] IB/iser: Increment the rkey when registering and not when invalidating

2015-12-09 Thread Christoph Hellwig
On Wed, Dec 09, 2015 at 02:12:07PM +0200, Sagi Grimberg wrote:
> With remote invalidate we won't local invalidate
> but we still want to increment the rkey.
> 
> Signed-off-by: Sagi Grimberg <sa...@mellanox.com>
> Signed-off-by: Jenny Derzhavetz <jen...@mellanox.com>

Looks fine,

Reviewed-by: Christoph Hellwig <h...@lst.de>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC contig pages support 1/2] IB: Supports contiguous memory operations

2015-12-08 Thread Christoph Hellwig
There is absolutely nothing IB specific here.  If you want to support
anonymous mmaps to allocate large contiguous pages work with the MM
folks on providing that in a generic fashion.

[full quote alert for reference:]

On Tue, Dec 08, 2015 at 05:15:06PM +0200, Yishai Hadas wrote:
> New structure 'cmem' represents the contiguous allocated memory.
> It supports:
> Allocate, Free, 'Map to virtual address' operations, etc.
> 
> Signed-off-by: Yishai Hadas 
> ---
>  drivers/infiniband/core/Makefile |   2 +-
>  drivers/infiniband/core/cmem.c   | 245 
> +++
>  include/rdma/ib_cmem.h   |  41 +++
>  3 files changed, 287 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/infiniband/core/cmem.c
>  create mode 100644 include/rdma/ib_cmem.h
> 
> diff --git a/drivers/infiniband/core/Makefile 
> b/drivers/infiniband/core/Makefile
> index d43a899..8549ea4 100644
> --- a/drivers/infiniband/core/Makefile
> +++ b/drivers/infiniband/core/Makefile
> @@ -11,7 +11,7 @@ obj-$(CONFIG_INFINIBAND_USER_ACCESS) += ib_uverbs.o 
> ib_ucm.o \
>  ib_core-y := packer.o ud_header.o verbs.o sysfs.o \
>   device.o fmr_pool.o cache.o netlink.o \
>   roce_gid_mgmt.o
> -ib_core-$(CONFIG_INFINIBAND_USER_MEM) += umem.o
> +ib_core-$(CONFIG_INFINIBAND_USER_MEM) += umem.o cmem.o
>  ib_core-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o umem_rbtree.o
>  
>  ib_mad-y :=  mad.o smi.o agent.o mad_rmpp.o
> diff --git a/drivers/infiniband/core/cmem.c b/drivers/infiniband/core/cmem.c
> new file mode 100644
> index 000..21d8573
> --- /dev/null
> +++ b/drivers/infiniband/core/cmem.c
> @@ -0,0 +1,245 @@
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include "uverbs.h"
> +
> +static void ib_cmem_release(struct kref *ref)
> +{
> + struct ib_cmem *cmem;
> + struct ib_cmem_block *cmem_block, *tmp;
> + unsigned long ntotal_pages;
> +
> + cmem = container_of(ref, struct ib_cmem, refcount);
> +
> + list_for_each_entry_safe(cmem_block, tmp, >ib_cmem_block, list) {
> + __free_pages(cmem_block->page, cmem->block_order);
> + list_del(_block->list);
> + kfree(cmem_block);
> + }
> + /* no locking is needed:
> +   * ib_cmem_release is called from vm_close which is always called
> +   * with mm->mmap_sem held for writing.
> +   * The only exception is when the process shutting down but in that 
> case
> +   * counter not relevant any more.
> +   */
> + if (current->mm) {
> + ntotal_pages = PAGE_ALIGN(cmem->length) >> PAGE_SHIFT;
> + current->mm->pinned_vm -= ntotal_pages;
> + }
> + kfree(cmem);
> +}
> +
> +/**
> + * ib_cmem_release_contiguous_pages - release memory allocated by
> + *  
> ib_cmem_alloc_contiguous_pages.
> + * @cmem: cmem struct to release
> + */
> +void ib_cmem_release_contiguous_pages(struct ib_cmem *cmem)
> +{
> + kref_put(>refcount, ib_cmem_release);
> +}
> +EXPORT_SYMBOL(ib_cmem_release_contiguous_pages);
> +
> +static void cmem_vma_open(struct vm_area_struct *area)
> +{
> + struct ib_cmem *ib_cmem;
> +
> + ib_cmem = (struct ib_cmem *)(area->vm_private_data);
> +
> + /* vm_open and vm_close are always called with mm->mmap_sem held for
> +   * writing. The only exception is when the process is shutting down, at
> +   * which point vm_close is called with no locks held, but since it is
> +   * after the VMAs have been detached, it is impossible that vm_open 
> will
> +   * be called. Therefore, there is no need to synchronize the kref_get 
> and
> +   * kref_put calls.
> + */
> + kref_get(_cmem->refcount);
> +}
> +
> +static void cmem_vma_close(struct vm_area_struct *area)
> +{
> + struct ib_cmem *cmem;
> +
> + cmem = (struct ib_cmem *)(area->vm_private_data);
> +
> + ib_cmem_release_contiguous_pages(cmem);
> +}
> +
> +static const struct vm_operations_struct cmem_contig_pages_vm_ops = {
> + .open = cmem_vma_open,
> + .close = cmem_vma_close
> +};
> +
> +/**
> + * ib_cmem_map_contiguous_pages_to_vma - map contiguous pages into VMA
> + * @ib_cmem: cmem structure returned by ib_cmem_alloc_contiguous_pages
> + * @vma: VMA to inject pages into.
> + */
> +int ib_cmem_map_contiguous_pages_to_vma(struct ib_cmem *ib_cmem,
> + struct vm_area_struct *vma)
> +{
> + int ret;
> + unsigned long page_entry;
> + unsigned long ntotal_pages;
> + unsigned long ncontig_pages;
> + unsigned long total_size;
> + struct page *page;
> + unsigned long vma_entry_number = 0;
> + struct ib_cmem_block *ib_cmem_block = NULL;
> +
> + total_size = vma->vm_end - vma->vm_start;
> + if (ib_cmem->length != total_size)
> + return -EINVAL;
> +
> + if 

Re: [PATCH v2 0/2] Handle mlx4 max_sge_rd correctly

2015-12-08 Thread Christoph Hellwig
On Thu, Dec 03, 2015 at 05:07:35PM +0200, Sagi Grimberg wrote:
> 
> >Did we ever make progress on this?
> 
> Just up to Doug to pull it in.

Doug, any chance to get this into the 4.4 queue?  It's annoying
having to work around this driver bug in every new ULP.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: device attr cleanup (was: Handle mlx4 max_sge_rd correctly)

2015-12-08 Thread Christoph Hellwig
On Tue, Dec 08, 2015 at 03:59:40PM -0700, Jason Gunthorpe wrote:
> Or, can we please stop this bikeshedding. Christoph's patch is done,
> well tested and does a lot more clean up that this hacky three liner.
> 
> It is a good patch, and although patchworks doesn't have my remarks
> from an earlier revision I still think it should go forward. 

While I'd prefer the version Or points to over not having anything
at all I'd much prefer sorting it properly and make the RDMA code
behave like all other Linux subsystems.

Jason, can you give me a formal ACK'ed by and I'll rebase it on top
of the current 4.4 queue so we could start the 4.5 window with it.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


completion queue abstraction V2

2015-12-07 Thread Christoph Hellwig
This series adds a new RDMA core abstraction that insulated the
ULPs from the nitty gritty details of CQ polling.  See the individual
patches for more details.

Note that this series should be applied on top of my
"IB: merge struct ib_device_attr into struct ib_device" patch and the
MR cleanups.

A git tree is also available:

git://git.infradead.org/users/hch/rdma.git rdma-cq

As well as gitweb:
http://git.infradead.org/users/hch/rdma.git/shortlog/refs/heads/rdma-cq

Changes since V1:
 - rebased
 - some additional irq_work cleanups
 - early exit from the polling loop (Jason)
 - changed ib_process_cq_direct API (Sagi / Jason / Bart)
 - dropped the ib_drain_qp helper for now, to be revisited later
 - cosmetic iser cleanups (or)
 - cosmetic srp changes (Bart)
 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   4   5   >