[PATCHv3 TRIVIAL] IB/core: Documentation fix to ib_mad_snoop_handler in the MAD header file

2016-01-05 Thread Hal Rosenstock
In ib_mad.h, ib_mad_snoop_handler uses send_buf rather than send_wr

Signed-off-by: Hal Rosenstock 
---
Change since v2:
Changed title to use "higher" language

Change since v1:
Fixed typo in patch description

diff --git a/include/rdma/ib_mad.h b/include/rdma/ib_mad.h
index ec9b44d..2b3573d 100644
--- a/include/rdma/ib_mad.h
+++ b/include/rdma/ib_mad.h
@@ -424,11 +424,11 @@ typedef void (*ib_mad_send_handler)(struct ib_mad_agent 
*mad_agent,
 /**
  * ib_mad_snoop_handler - Callback handler for snooping sent MADs.
  * @mad_agent: MAD agent that snooped the MAD.
- * @send_wr: Work request information on the sent MAD.
+ * @send_buf: send MAD data buffer.
  * @mad_send_wc: Work completion information on the sent MAD.  Valid
  *   only for snooping that occurs on a send completion.
  *
- * Clients snooping MADs should not modify data referenced by the @send_wr
+ * Clients snooping MADs should not modify data referenced by the @send_buf
  * or @mad_send_wc.
  */
 typedef void (*ib_mad_snoop_handler)(struct ib_mad_agent *mad_agent,
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/11] IB: only keep a single key in struct ib_mr

2016-01-05 Thread Jason Gunthorpe
On Fri, Dec 25, 2015 at 06:46:07PM +0200, Liran Liss wrote:
> > From: Jason Gunthorpe 
> 
> > > >fill mr->key by the lkey or rkey based on that and everything will
> > > >work fine.
> > >
> > > But the ULP *can* register a memory buffer with local and remote
> > > access permissions.
> > Not in the new API.
> >
> > If a ULP ever comes along that does need that then they can start with
> > two MRs and then eventually upgrade the kapi to have some kind of
> > efficient bidir MR mode.
> 
> ULPs are *already* using the same registrations for both local and
> remote access.

Where? Out of tree?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: device attr cleanup

2016-01-05 Thread Steve Wise


> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org 
> [mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Doug Ledford
> Sent: Wednesday, December 23, 2015 9:19 PM
> To: J. Bruce Fields; Chuck Lever
> Cc: Christoph Hellwig; Anna Schumaker; Jason Gunthorpe; 
> linux-rdma@vger.kernel.org; ira.weiny; Or Gerlitz; Steve Wise; Or Gerlitz;
Sagi
> Grimberg
> Subject: Re: device attr cleanup
> 
> On 12/23/2015 04:31 PM, J. Bruce Fields wrote:
> > On Thu, Dec 10, 2015 at 07:49:59PM -0500, Chuck Lever wrote:
> >>
> >>> On Dec 10, 2015, at 6:30 PM, Christoph Hellwig  wrote:
> >>>
> >>> On Thu, Dec 10, 2015 at 11:07:03AM -0700, Jason Gunthorpe wrote:
>  The ARM folks do this sort of stuff on a regular basis.. Very early on
>  Doug prepares a topic branch with only the big change, NFS folks pull
>  it and then pull your work. Then Doug would send the topic branch to
>  Linus as soon as the merge window opens, then NFS would send theirs.
> 
>  This is alot less work overall than trying to sequence multiple
>  patches over multiple releases..
> >>>
> >>> Agreed.  Staging has alaways been a giant pain and things tend to never
> >>> finish moving over that way if they are non-trivial enough.
> >>
> >> In that case:
> >>
> >> You need to make sure you have all the right Acks. I've added
> >> Anna and Bruce to Ack the NFS-related portions. Santosh should
> >> Ack the RDS part.
> >>
> >> http://git.infradead.org/users/hch/rdma.git/shortlog/refs/heads/ib_device_attr
> >
> > Fine by me.
> >
> >> Given the proximity to the holidays and the next merge window,
> >> Doug will need to get a properly-acked topic branch published
> >> in the next day or two so the rest of us can rebase and start
> >> testing before the relevant parties disappear for the holiday.
> >
> > What branch should I be working from?
> 
> That patch was very intrusive (and I didn't like the change to the
> structure organization).  An alternative patch was proposed and I took
> it instead.  The patch I took is much less intrusive, but you might
> still need to adjust things slightly.  I've pushed my current WIP
> for-next branch to my github repo:
> 
> git://github.com/dledford/linux.git for-next
> 
> This branch might get rebased still yet before it gets pushed to my
> official repo at k.o, but it is perfectly fine to check that your
> patches will merge with my for-next branch without conflicts.
>

Hey Doug, I don't see this branch.  Which branch has the accepted device attr 
change?

Steve.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: device attr cleanup

2016-01-05 Thread Or Gerlitz
On Tue, Jan 5, 2016 at 6:46 PM, Steve Wise  wrote:
> Hey Doug, I don't see this branch.  Which branch has the accepted device attr 
> change?

k.o/for-4.5 on Doug's kernel.org tree
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCHv3 TRIVIAL] IB/core: Documentation fix to ib_mad_snoop_handler in the MAD header file

2016-01-05 Thread Hefty, Sean
> In ib_mad.h, ib_mad_snoop_handler uses send_buf rather than send_wr

The MAD snooping should be removed from the mad stack.  There are no in tree 
users and the only attempt at adding one was rejected.


[PATCH 06/15] IB/srpt: Simplify srpt_handle_tsk_mgmt()

2016-01-05 Thread Bart Van Assche
Let the target core check task existence instead of the SRP target
driver.

Signed-off-by: Bart Van Assche 
Cc: Christoph Hellwig 
---
 drivers/infiniband/ulp/srpt/ib_srpt.c | 54 ++-
 1 file changed, 2 insertions(+), 52 deletions(-)

diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c 
b/drivers/infiniband/ulp/srpt/ib_srpt.c
index fc19203..9cb1a14 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -1554,47 +1554,6 @@ send_sense:
return -1;
 }
 
-/**
- * srpt_rx_mgmt_fn_tag() - Process a task management function by tag.
- * @ch: RDMA channel of the task management request.
- * @fn: Task management function to perform.
- * @req_tag: Tag of the SRP task management request.
- * @mgmt_ioctx: I/O context of the task management request.
- *
- * Returns zero if the target core will process the task management
- * request asynchronously.
- *
- * Note: It is assumed that the initiator serializes tag-based task management
- * requests.
- */
-static int srpt_rx_mgmt_fn_tag(struct srpt_send_ioctx *ioctx, u64 tag)
-{
-   struct srpt_device *sdev;
-   struct srpt_rdma_ch *ch;
-   struct srpt_send_ioctx *target;
-   int ret, i;
-
-   ret = -EINVAL;
-   ch = ioctx->ch;
-   BUG_ON(!ch);
-   BUG_ON(!ch->sport);
-   sdev = ch->sport->sdev;
-   BUG_ON(!sdev);
-   spin_lock_irq(>spinlock);
-   for (i = 0; i < ch->rq_size; ++i) {
-   target = ch->ioctx_ring[i];
-   if (target->cmd.se_lun == ioctx->cmd.se_lun &&
-   target->cmd.tag == tag &&
-   srpt_get_cmd_state(target) != SRPT_STATE_DONE) {
-   ret = 0;
-   /* now let the target core abort >cmd; */
-   break;
-   }
-   }
-   spin_unlock_irq(>spinlock);
-   return ret;
-}
-
 static int srp_tmr_to_tcm(int fn)
 {
switch (fn) {
@@ -1628,7 +1587,6 @@ static void srpt_handle_tsk_mgmt(struct srpt_rdma_ch *ch,
struct srp_tsk_mgmt *srp_tsk;
struct se_cmd *cmd;
struct se_session *sess = ch->sess;
-   uint32_t tag = 0;
int tcm_tmr;
int rc;
 
@@ -1649,18 +1607,10 @@ static void srpt_handle_tsk_mgmt(struct srpt_rdma_ch 
*ch,
TMR_TASK_MGMT_FUNCTION_NOT_SUPPORTED;
goto fail;
}
-   if (srp_tsk->tsk_mgmt_func == SRP_TSK_ABORT_TASK) {
-   rc = srpt_rx_mgmt_fn_tag(send_ioctx, srp_tsk->task_tag);
-   if (rc < 0) {
-   send_ioctx->cmd.se_tmr_req->response =
-   TMR_TASK_DOES_NOT_EXIST;
-   goto fail;
-   }
-   tag = srp_tsk->task_tag;
-   }
rc = target_submit_tmr(_ioctx->cmd, sess, NULL,
   scsilun_to_int(_tsk->lun), srp_tsk, tcm_tmr,
-  GFP_KERNEL, tag, TARGET_SCF_ACK_KREF);
+  GFP_KERNEL, srp_tsk->task_tag,
+  TARGET_SCF_ACK_KREF);
if (rc != 0) {
send_ioctx->cmd.se_tmr_req->response = TMR_FUNCTION_REJECTED;
goto fail;
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/15] IB/srpt: Simplify channel state management

2016-01-05 Thread Bart Van Assche
The only allowed channel state changes are those that change
the channel state into a state with a higher numerical value.
This allows to merge the functions srpt_set_ch_state() and
srpt_test_and_set_ch_state() into a single function.

Signed-off-by: Bart Van Assche 
Cc: Christoph Hellwig 
---
 drivers/infiniband/ulp/srpt/ib_srpt.c | 42 +--
 1 file changed, 10 insertions(+), 32 deletions(-)

diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c 
b/drivers/infiniband/ulp/srpt/ib_srpt.c
index 9cb1a14..a27414d 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -96,37 +96,21 @@ static int srpt_queue_status(struct se_cmd *cmd);
 static void srpt_recv_done(struct ib_cq *cq, struct ib_wc *wc);
 static void srpt_send_done(struct ib_cq *cq, struct ib_wc *wc);
 
-static enum rdma_ch_state
-srpt_set_ch_state(struct srpt_rdma_ch *ch, enum rdma_ch_state new_state)
+static bool srpt_set_ch_state(struct srpt_rdma_ch *ch, enum rdma_ch_state new)
 {
unsigned long flags;
enum rdma_ch_state prev;
+   bool changed = false;
 
spin_lock_irqsave(>spinlock, flags);
prev = ch->state;
-   ch->state = new_state;
-   spin_unlock_irqrestore(>spinlock, flags);
-   return prev;
-}
-
-/**
- * srpt_test_and_set_ch_state() - Test and set the channel state.
- *
- * Returns true if and only if the channel state has been set to the new state.
- */
-static bool
-srpt_test_and_set_ch_state(struct srpt_rdma_ch *ch, enum rdma_ch_state old,
-  enum rdma_ch_state new)
-{
-   unsigned long flags;
-   enum rdma_ch_state prev;
-
-   spin_lock_irqsave(>spinlock, flags);
-   prev = ch->state;
-   if (prev == old)
+   if (new > prev) {
ch->state = new;
+   changed = true;
+   }
spin_unlock_irqrestore(>spinlock, flags);
-   return prev == old;
+
+   return changed;
 }
 
 /**
@@ -199,8 +183,7 @@ static void srpt_qp_event(struct ib_event *event, struct 
srpt_rdma_ch *ch)
ib_cm_notify(ch->cm_id, event->event);
break;
case IB_EVENT_QP_LAST_WQE_REACHED:
-   if (srpt_test_and_set_ch_state(ch, CH_DRAINING,
-  CH_RELEASING))
+   if (srpt_set_ch_state(ch, CH_RELEASING))
srpt_release_channel(ch);
else
pr_debug("%s: state %d - ignored LAST_WQE.\n",
@@ -1946,12 +1929,7 @@ static void srpt_drain_channel(struct ib_cm_id *cm_id)
spin_lock_irq(>spinlock);
list_for_each_entry(ch, >rch_list, list) {
if (ch->cm_id == cm_id) {
-   do_reset = srpt_test_and_set_ch_state(ch,
-   CH_CONNECTING, CH_DRAINING) ||
-  srpt_test_and_set_ch_state(ch,
-   CH_LIVE, CH_DRAINING) ||
-  srpt_test_and_set_ch_state(ch,
-   CH_DISCONNECTING, CH_DRAINING);
+   do_reset = srpt_set_ch_state(ch, CH_DRAINING);
break;
}
}
@@ -2368,7 +2346,7 @@ static void srpt_cm_rtu_recv(struct ib_cm_id *cm_id)
ch = srpt_find_channel(cm_id->context, cm_id);
BUG_ON(!ch);
 
-   if (srpt_test_and_set_ch_state(ch, CH_CONNECTING, CH_LIVE)) {
+   if (srpt_set_ch_state(ch, CH_LIVE)) {
struct srpt_recv_ioctx *ioctx, *ioctx_tmp;
 
ret = srpt_ch_qp_rts(ch, ch->qp);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 08/15] IB/srpt: Simplify srpt_shutdown_session()

2016-01-05 Thread Bart Van Assche
The target core guarantees that shutdown_session() is only invoked
once per session. This means that the ib_srpt target driver doesn't
have to track whether or not shutdown_session() has been called.
Additionally, ensure that target_sess_cmd_list_set_waiting() is
called before target_wait_for_sess_cmds() by moving it into
srpt_release_channel_work().

Signed-off-by: Bart Van Assche 
Cc: Christoph Hellwig 
---
 drivers/infiniband/ulp/srpt/ib_srpt.c | 14 +-
 drivers/infiniband/ulp/srpt/ib_srpt.h |  1 -
 2 files changed, 1 insertion(+), 14 deletions(-)

diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c 
b/drivers/infiniband/ulp/srpt/ib_srpt.c
index a27414d..0ff4ed6 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -1887,19 +1887,6 @@ static void srpt_close_ch(struct srpt_rdma_ch *ch)
  */
 static int srpt_shutdown_session(struct se_session *se_sess)
 {
-   struct srpt_rdma_ch *ch = se_sess->fabric_sess_ptr;
-   unsigned long flags;
-
-   spin_lock_irqsave(>spinlock, flags);
-   if (ch->in_shutdown) {
-   spin_unlock_irqrestore(>spinlock, flags);
-   return true;
-   }
-
-   ch->in_shutdown = true;
-   target_sess_cmd_list_set_waiting(se_sess);
-   spin_unlock_irqrestore(>spinlock, flags);
-
return true;
 }
 
@@ -2003,6 +1990,7 @@ static void srpt_release_channel_work(struct work_struct 
*w)
se_sess = ch->sess;
BUG_ON(!se_sess);
 
+   target_sess_cmd_list_set_waiting(se_sess);
target_wait_for_sess_cmds(se_sess);
 
transport_deregister_session_configfs(se_sess);
diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.h 
b/drivers/infiniband/ulp/srpt/ib_srpt.h
index a98b86b..d0de468 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.h
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.h
@@ -288,7 +288,6 @@ struct srpt_rdma_ch {
u8  sess_name[36];
struct work_struct  release_work;
struct completion   *release_done;
-   boolin_shutdown;
 };
 
 /**
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/15] IB/srpt: Add parentheses around sizeof argument

2016-01-05 Thread Bart Van Assche
Although sizeof is an operator and hence in many cases parentheses can
be left out, the recommended kernel coding style is to surround the
sizeof argument with parentheses. This patch does not change any
functionality. This patch has been generated by running the following
shell command:

sed -i 's/sizeof \([^ );,]*\)/sizeof(\1)/g' drivers/infiniband/ulp/srpt/*.[ch]

Signed-off-by: Bart Van Assche 
Cc: Christoph Hellwig 
---
 drivers/infiniband/ulp/srpt/ib_srpt.c | 42 +--
 1 file changed, 21 insertions(+), 21 deletions(-)

diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c 
b/drivers/infiniband/ulp/srpt/ib_srpt.c
index 0c3ed7c..fe9474f 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -281,7 +281,7 @@ static void srpt_get_class_port_info(struct ib_dm_mad *mad)
struct ib_class_port_info *cif;
 
cif = (struct ib_class_port_info *)mad->data;
-   memset(cif, 0, sizeof *cif);
+   memset(cif, 0, sizeof(*cif));
cif->base_version = 1;
cif->class_version = 1;
cif->resp_time_value = 20;
@@ -340,7 +340,7 @@ static void srpt_get_ioc(struct srpt_port *sport, u32 slot,
return;
}
 
-   memset(iocp, 0, sizeof *iocp);
+   memset(iocp, 0, sizeof(*iocp));
strcpy(iocp->id_string, SRPT_ID_STRING);
iocp->guid = cpu_to_be64(srpt_service_guid);
iocp->vendor_id = cpu_to_be32(sdev->dev_attr.vendor_id);
@@ -390,7 +390,7 @@ static void srpt_get_svc_entries(u64 ioc_guid,
}
 
svc_entries = (struct ib_dm_svc_entries *)mad->data;
-   memset(svc_entries, 0, sizeof *svc_entries);
+   memset(svc_entries, 0, sizeof(*svc_entries));
svc_entries->service_entries[0].id = cpu_to_be64(ioc_guid);
snprintf(svc_entries->service_entries[0].name,
 sizeof(svc_entries->service_entries[0].name),
@@ -483,7 +483,7 @@ static void srpt_mad_recv_handler(struct ib_mad_agent 
*mad_agent,
rsp->ah = ah;
 
dm_mad = rsp->mad;
-   memcpy(dm_mad, mad_wc->recv_buf.mad, sizeof *dm_mad);
+   memcpy(dm_mad, mad_wc->recv_buf.mad, sizeof(*dm_mad));
dm_mad->mad_hdr.method = IB_MGMT_METHOD_GET_RESP;
dm_mad->mad_hdr.status = 0;
 
@@ -531,7 +531,7 @@ static int srpt_refresh_port(struct srpt_port *sport)
struct ib_port_attr port_attr;
int ret;
 
-   memset(_modify, 0, sizeof port_modify);
+   memset(_modify, 0, sizeof(port_modify));
port_modify.set_port_cap_mask = IB_PORT_DEVICE_MGMT_SUP;
port_modify.clr_port_cap_mask = 0;
 
@@ -552,7 +552,7 @@ static int srpt_refresh_port(struct srpt_port *sport)
goto err_query_port;
 
if (!sport->mad_agent) {
-   memset(_req, 0, sizeof reg_req);
+   memset(_req, 0, sizeof(reg_req));
reg_req.mgmt_class = IB_MGMT_CLASS_DEVICE_MGMT;
reg_req.mgmt_class_version = IB_MGMT_BASE_VERSION;
set_bit(IB_MGMT_METHOD_GET, reg_req.method_mask);
@@ -902,14 +902,14 @@ static int srpt_get_desc_tbl(struct srpt_send_ioctx 
*ioctx,
 
db = (struct srp_direct_buf *)(srp_cmd->add_data
   + add_cdb_offset);
-   memcpy(ioctx->rbufs, db, sizeof *db);
+   memcpy(ioctx->rbufs, db, sizeof(*db));
*data_len = be32_to_cpu(db->len);
} else if (((srp_cmd->buf_fmt & 0xf) == SRP_DATA_DESC_INDIRECT) ||
   ((srp_cmd->buf_fmt >> 4) == SRP_DATA_DESC_INDIRECT)) {
idb = (struct srp_indirect_buf *)(srp_cmd->add_data
  + add_cdb_offset);
 
-   ioctx->n_rbuf = be32_to_cpu(idb->table_desc.len) / sizeof *db;
+   ioctx->n_rbuf = be32_to_cpu(idb->table_desc.len) / sizeof(*db);
 
if (ioctx->n_rbuf >
(srp_cmd->data_out_desc_cnt + srp_cmd->data_in_desc_cnt)) {
@@ -928,7 +928,7 @@ static int srpt_get_desc_tbl(struct srpt_send_ioctx *ioctx,
ioctx->rbufs = >single_rbuf;
else {
ioctx->rbufs =
-   kmalloc(ioctx->n_rbuf * sizeof *db, GFP_ATOMIC);
+   kmalloc(ioctx->n_rbuf * sizeof(*db), 
GFP_ATOMIC);
if (!ioctx->rbufs) {
ioctx->n_rbuf = 0;
ret = -ENOMEM;
@@ -937,7 +937,7 @@ static int srpt_get_desc_tbl(struct srpt_send_ioctx *ioctx,
}
 
db = idb->desc_list;
-   memcpy(ioctx->rbufs, db, ioctx->n_rbuf * sizeof *db);
+   memcpy(ioctx->rbufs, db, ioctx->n_rbuf * sizeof(*db));
*data_len = be32_to_cpu(idb->len);
}
 out:
@@ -955,7 +955,7 @@ static int srpt_init_ch_qp(struct srpt_rdma_ch *ch, struct 
ib_qp *qp)
struct 

[PATCH 02/15] IB/srpt: Inline srpt_sdev_name()

2016-01-05 Thread Bart Van Assche
srpt_sdev_name() is too trivial to keep it as a separate function.

Signed-off-by: Bart Van Assche 
Cc: Christoph Hellwig 
---
 drivers/infiniband/ulp/srpt/ib_srpt.c | 14 ++
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c 
b/drivers/infiniband/ulp/srpt/ib_srpt.c
index fe9474f..7abad26 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -109,16 +109,6 @@ enum dma_data_direction opposite_dma_dir(enum 
dma_data_direction dir)
}
 }
 
-/**
- * srpt_sdev_name() - Return the name associated with the HCA.
- *
- * Examples are ib0, ib1, ...
- */
-static inline const char *srpt_sdev_name(struct srpt_device *sdev)
-{
-   return sdev->device->name;
-}
-
 static enum rdma_ch_state srpt_get_ch_state(struct srpt_rdma_ch *ch)
 {
unsigned long flags;
@@ -182,7 +172,7 @@ static void srpt_event_handler(struct ib_event_handler 
*handler,
return;
 
pr_debug("ASYNC event= %d on device= %s\n", event->event,
-srpt_sdev_name(sdev));
+sdev->device->name);
 
switch (event->event) {
case IB_EVENT_PORT_ERR:
@@ -3089,7 +3079,7 @@ static void srpt_add_one(struct ib_device *device)
 
if (srpt_refresh_port(sport)) {
pr_err("MAD registration failed for %s-%d.\n",
-  srpt_sdev_name(sdev), i);
+  sdev->device->name, i);
goto err_ring;
}
snprintf(sport->port_guid, sizeof(sport->port_guid),
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 15/15] IB/srpt: Fix a rare crash in srpt_close_session()

2016-01-05 Thread Bart Van Assche
Keep the ib_srpt session as long as srpt_close_session() may
access it.

Signed-off-by: Bart Van Assche 
Cc: Christoph Hellwig 
---
 drivers/infiniband/ulp/srpt/ib_srpt.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c 
b/drivers/infiniband/ulp/srpt/ib_srpt.c
index 669ae5c..7548bd5 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -2228,6 +2228,8 @@ static int srpt_cm_req_recv(struct ib_cm_id *cm_id,
pr_debug("Establish connection sess=%p name=%s cm_id=%p\n", ch->sess,
 ch->sess_name, ch->cm_id);
 
+   kref_get(>kref);
+
/* create srp_login_response */
rsp->opcode = SRP_LOGIN_RSP;
rsp->tag = req->tag;
@@ -2991,6 +2993,8 @@ static void srpt_close_session(struct se_session *se_sess)
 
srpt_disconnect_ch(ch);
 
+   kref_put(>kref, srpt_free_ch);
+
if (!wait)
return;
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/15] IB/srpt: Fix how aborted commands are processed

2016-01-05 Thread Bart Van Assche
srpt_abort_cmd() must not be called in state SRPT_STATE_DATA_IN. Issue
a warning if this occurs.

srpt_abort_cmd() must not invoke target_put_sess_cmd() for commands
in state SRPT_STATE_DONE because the srpt_abort_cmd() callers already
do this when necessary. Hence remove this call.

If an RDMA read fails the corresponding SCSI command must fail. Hence
add a transport_generic_request_failure() call.

Remove an incorrect srpt_abort_cmd() call from srpt_rdma_write_done().

Avoid that srpt_send_done() calls srpt_abort_cmd() for finished SCSI
commands.

Signed-off-by: Bart Van Assche 
Cc: Christoph Hellwig 
---
 drivers/infiniband/ulp/srpt/ib_srpt.c | 39 ++-
 1 file changed, 15 insertions(+), 24 deletions(-)

diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c 
b/drivers/infiniband/ulp/srpt/ib_srpt.c
index 395bc1b..2ae3c1b 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -1273,25 +1273,17 @@ static int srpt_abort_cmd(struct srpt_send_ioctx *ioctx)
case SRPT_STATE_NEED_DATA:
ioctx->state = SRPT_STATE_DATA_IN;
break;
-   case SRPT_STATE_DATA_IN:
case SRPT_STATE_CMD_RSP_SENT:
case SRPT_STATE_MGMT_RSP_SENT:
ioctx->state = SRPT_STATE_DONE;
break;
default:
+   WARN_ONCE(true, "%s: unexpected I/O context state %d\n",
+ __func__, state);
break;
}
spin_unlock_irqrestore(>spinlock, flags);
 
-   if (state == SRPT_STATE_DONE) {
-   struct srpt_rdma_ch *ch = ioctx->ch;
-
-   BUG_ON(ch->sess == NULL);
-
-   target_put_sess_cmd(>cmd);
-   goto out;
-   }
-
pr_debug("Aborting cmd with state %d and tag %lld\n", state,
 ioctx->cmd.tag);
 
@@ -1299,14 +1291,16 @@ static int srpt_abort_cmd(struct srpt_send_ioctx *ioctx)
case SRPT_STATE_NEW:
case SRPT_STATE_DATA_IN:
case SRPT_STATE_MGMT:
+   case SRPT_STATE_DONE:
/*
 * Do nothing - defer abort processing until
 * srpt_queue_response() is invoked.
 */
-   WARN_ON(!(ioctx->cmd.transport_state & CMD_T_ABORTED));
break;
case SRPT_STATE_NEED_DATA:
-   /* DMA_TO_DEVICE (write) - RDMA read error. */
+   pr_debug("tag %#llx: RDMA read error\n", ioctx->cmd.tag);
+   transport_generic_request_failure(>cmd,
+   TCM_CHECK_CONDITION_ABORT_CMD);
break;
case SRPT_STATE_CMD_RSP_SENT:
/*
@@ -1314,18 +1308,16 @@ static int srpt_abort_cmd(struct srpt_send_ioctx *ioctx)
 * not been received in time.
 */
srpt_unmap_sg_to_ib_sge(ioctx->ch, ioctx);
-   target_put_sess_cmd(>cmd);
+   transport_generic_free_cmd(>cmd, 0);
break;
case SRPT_STATE_MGMT_RSP_SENT:
-   srpt_set_cmd_state(ioctx, SRPT_STATE_DONE);
-   target_put_sess_cmd(>cmd);
+   transport_generic_free_cmd(>cmd, 0);
break;
default:
WARN(1, "Unexpected command state (%d)", state);
break;
}
 
-out:
return state;
 }
 
@@ -1365,9 +1357,14 @@ static void srpt_rdma_write_done(struct ib_cq *cq, 
struct ib_wc *wc)
container_of(wc->wr_cqe, struct srpt_send_ioctx, rdma_cqe);
 
if (unlikely(wc->status != IB_WC_SUCCESS)) {
+   /*
+* Note: if an RDMA write error completion is received that
+* means that a SEND also has been posted. Defer further
+* processing of the associated command until the send error
+* completion has been received.
+*/
pr_info("RDMA_WRITE for ioctx 0x%p failed with status %d\n",
ioctx, wc->status);
-   srpt_abort_cmd(ioctx);
}
 }
 
@@ -1716,15 +1713,10 @@ static void srpt_send_done(struct ib_cq *cq, struct 
ib_wc *wc)
 
atomic_inc(>sq_wr_avail);
 
-   if (wc->status != IB_WC_SUCCESS) {
+   if (wc->status != IB_WC_SUCCESS)
pr_info("sending response for ioctx 0x%p failed"
" with status %d\n", ioctx, wc->status);
 
-   atomic_dec(>req_lim);
-   srpt_abort_cmd(ioctx);
-   goto out;
-   }
-
if (state != SRPT_STATE_DONE) {
srpt_unmap_sg_to_ib_sge(ch, ioctx);
transport_generic_free_cmd(>cmd, 0);
@@ -1733,7 +1725,6 @@ static void srpt_send_done(struct ib_cq *cq, struct ib_wc 
*wc)
   " wr_id = %u.\n", ioctx->ioctx.index);
}
 
-out:
while (!list_empty(>cmd_wait_list) &&
   ch->state == 

[PATCH 12/15] IB/srpt: Eliminate srpt_find_channel()

2016-01-05 Thread Bart Van Assche
In the CM REQ message handler, store the channel pointer in
cm_id->context such that the function srpt_find_channel() is no
longer needed. Additionally, make the CM event messages more
informative.

Signed-off-by: Bart Van Assche 
Cc: Christoph Hellwig 
---
 drivers/infiniband/ulp/srpt/ib_srpt.c | 113 +-
 1 file changed, 43 insertions(+), 70 deletions(-)

diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c 
b/drivers/infiniband/ulp/srpt/ib_srpt.c
index 2ae3c1b..6ec130d 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -1893,25 +1893,14 @@ static int srpt_shutdown_session(struct se_session 
*se_sess)
  * ib_destroy_cm_id(), which locks the cm_id spinlock and hence waits until
  * this function has finished).
  */
-static void srpt_drain_channel(struct ib_cm_id *cm_id)
+static void srpt_drain_channel(struct srpt_rdma_ch *ch)
 {
-   struct srpt_device *sdev;
-   struct srpt_rdma_ch *ch;
int ret;
bool do_reset = false;
 
WARN_ON_ONCE(irqs_disabled());
 
-   sdev = cm_id->context;
-   BUG_ON(!sdev);
-   spin_lock_irq(>spinlock);
-   list_for_each_entry(ch, >rch_list, list) {
-   if (ch->cm_id == cm_id) {
-   do_reset = srpt_set_ch_state(ch, CH_DRAINING);
-   break;
-   }
-   }
-   spin_unlock_irq(>spinlock);
+   do_reset = srpt_set_ch_state(ch, CH_DRAINING);
 
if (do_reset) {
if (ch->sess)
@@ -1925,34 +1914,6 @@ static void srpt_drain_channel(struct ib_cm_id *cm_id)
 }
 
 /**
- * srpt_find_channel() - Look up an RDMA channel.
- * @cm_id: Pointer to the CM ID of the channel to be looked up.
- *
- * Return NULL if no matching RDMA channel has been found.
- */
-static struct srpt_rdma_ch *srpt_find_channel(struct srpt_device *sdev,
- struct ib_cm_id *cm_id)
-{
-   struct srpt_rdma_ch *ch;
-   bool found;
-
-   WARN_ON_ONCE(irqs_disabled());
-   BUG_ON(!sdev);
-
-   found = false;
-   spin_lock_irq(>spinlock);
-   list_for_each_entry(ch, >rch_list, list) {
-   if (ch->cm_id == cm_id) {
-   found = true;
-   break;
-   }
-   }
-   spin_unlock_irq(>spinlock);
-
-   return found ? ch : NULL;
-}
-
-/**
  * srpt_release_channel() - Release channel resources.
  *
  * Schedules the actual release because:
@@ -2160,6 +2121,7 @@ static int srpt_cm_req_recv(struct ib_cm_id *cm_id,
memcpy(ch->t_port_id, req->target_port_id, 16);
ch->sport = >port[param->port - 1];
ch->cm_id = cm_id;
+   cm_id->context = ch;
/*
 * Avoid QUEUE_FULL conditions by limiting the number of buffers used
 * for the SRP protocol to the command queue size.
@@ -2304,10 +2266,23 @@ out:
return ret;
 }
 
-static void srpt_cm_rej_recv(struct ib_cm_id *cm_id)
+static void srpt_cm_rej_recv(struct srpt_rdma_ch *ch,
+enum ib_cm_rej_reason reason,
+const u8 *private_data,
+u8 private_data_len)
 {
-   pr_info("Received IB REJ for cm_id %p.\n", cm_id);
-   srpt_drain_channel(cm_id);
+   char *priv = kmalloc(private_data_len * 3 + 1, GFP_KERNEL);
+   int i;
+
+   if (priv) {
+   priv[0] = '\0';
+   for (i = 0; i < private_data_len; i++)
+   sprintf(priv + 3 * i, "%02x ", private_data[i]);
+   }
+   pr_info("Received CM REJ for ch %s-%d; reason %d; private data %s.\n",
+   ch->sess_name, ch->qp->qp_num, reason, priv ? : "(?)");
+   kfree(priv);
+   srpt_drain_channel(ch);
 }
 
 /**
@@ -2316,14 +2291,10 @@ static void srpt_cm_rej_recv(struct ib_cm_id *cm_id)
  * An IB_CM_RTU_RECEIVED message indicates that the connection is established
  * and that the recipient may begin transmitting (RTU = ready to use).
  */
-static void srpt_cm_rtu_recv(struct ib_cm_id *cm_id)
+static void srpt_cm_rtu_recv(struct srpt_rdma_ch *ch)
 {
-   struct srpt_rdma_ch *ch;
int ret;
 
-   ch = srpt_find_channel(cm_id->context, cm_id);
-   BUG_ON(!ch);
-
if (srpt_set_ch_state(ch, CH_LIVE)) {
struct srpt_recv_ioctx *ioctx, *ioctx_tmp;
 
@@ -2339,31 +2310,30 @@ static void srpt_cm_rtu_recv(struct ib_cm_id *cm_id)
}
 }
 
-static void srpt_cm_timewait_exit(struct ib_cm_id *cm_id)
+static void srpt_cm_timewait_exit(struct srpt_rdma_ch *ch)
 {
-   pr_info("Received IB TimeWait exit for cm_id %p.\n", cm_id);
-   srpt_drain_channel(cm_id);
+   pr_info("Received CM TimeWait exit for ch %s-%d.\n", ch->sess_name,
+   ch->qp->qp_num);
+   srpt_drain_channel(ch);
 }
 
-static void srpt_cm_rep_error(struct ib_cm_id *cm_id)
+static void srpt_cm_rep_error(struct srpt_rdma_ch *ch)
 {
-  

[PATCH 10/15] IB/srpt: Fix srpt_handle_cmd() error paths

2016-01-05 Thread Bart Van Assche
The target core function that should be called if target_submit_cmd()
fails is target_put_sess_cmd(). Additionally, change the return type
of srpt_handle_cmd() from int into void.

Signed-off-by: Bart Van Assche 
Cc: Christoph Hellwig 
---
 drivers/infiniband/ulp/srpt/ib_srpt.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c 
b/drivers/infiniband/ulp/srpt/ib_srpt.c
index 1857d17..395bc1b 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -1482,15 +1482,14 @@ static int srpt_check_stop_free(struct se_cmd *cmd)
 /**
  * srpt_handle_cmd() - Process SRP_CMD.
  */
-static int srpt_handle_cmd(struct srpt_rdma_ch *ch,
-  struct srpt_recv_ioctx *recv_ioctx,
-  struct srpt_send_ioctx *send_ioctx)
+static void srpt_handle_cmd(struct srpt_rdma_ch *ch,
+   struct srpt_recv_ioctx *recv_ioctx,
+   struct srpt_send_ioctx *send_ioctx)
 {
struct se_cmd *cmd;
struct srp_cmd *srp_cmd;
u64 data_len;
enum dma_data_direction dir;
-   sense_reason_t ret;
int rc;
 
BUG_ON(!send_ioctx);
@@ -1518,8 +1517,7 @@ static int srpt_handle_cmd(struct srpt_rdma_ch *ch,
if (srpt_get_desc_tbl(send_ioctx, srp_cmd, , _len)) {
pr_err("0x%llx: parsing SRP descriptor table failed.\n",
   srp_cmd->tag);
-   ret = TCM_INVALID_CDB_FIELD;
-   goto send_sense;
+   goto put;
}
 
rc = target_submit_cmd(cmd, ch->sess, srp_cmd->cdb,
@@ -1527,14 +1525,16 @@ static int srpt_handle_cmd(struct srpt_rdma_ch *ch,
   scsilun_to_int(_cmd->lun), data_len,
   TCM_SIMPLE_TAG, dir, TARGET_SCF_ACK_KREF);
if (rc != 0) {
-   ret = TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
-   goto send_sense;
+   pr_debug("target_submit_cmd() returned %d for tag %#llx\n", rc,
+srp_cmd->tag);
+   goto put;
}
-   return 0;
+   return;
 
-send_sense:
-   transport_send_check_condition_and_sense(cmd, ret, 0);
-   return -1;
+put:
+   send_ioctx->state = SRPT_STATE_DONE;
+   target_put_sess_cmd(cmd);
+   return;
 }
 
 static int srp_tmr_to_tcm(int fn)
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/15] IB/srpt: Fix srpt_close_session()

2016-01-05 Thread Bart Van Assche
Avoid that srpt_close_session() waits if it doesn't have to wait.
Additionally, increase the time during which srpt_close_session()
waits until closing a session has finished. This makes it easier
to detect session shutdown bugs.

Signed-off-by: Bart Van Assche 
Cc: Christoph Hellwig 
---
 drivers/infiniband/ulp/srpt/ib_srpt.c | 31 ---
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c 
b/drivers/infiniband/ulp/srpt/ib_srpt.c
index 0ff4ed6..1857d17 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -1981,8 +1981,8 @@ static void srpt_release_channel_work(struct work_struct 
*w)
struct se_session *se_sess;
 
ch = container_of(w, struct srpt_rdma_ch, release_work);
-   pr_debug("ch = %p; ch->sess = %p; release_done = %p\n", ch, ch->sess,
-ch->release_done);
+   pr_debug("%s: %s-%d; release_done = %p\n", __func__, ch->sess_name,
+ch->qp->qp_num, ch->release_done);
 
sdev = ch->sport->sdev;
BUG_ON(!sdev);
@@ -2006,11 +2006,10 @@ static void srpt_release_channel_work(struct 
work_struct *w)
 ch->rsp_size, DMA_TO_DEVICE);
 
spin_lock_irq(>spinlock);
-   list_del(>list);
-   spin_unlock_irq(>spinlock);
-
+   list_del_init(>list);
if (ch->release_done)
complete(ch->release_done);
+   spin_unlock_irq(>spinlock);
 
wake_up(>ch_releaseQ);
 
@@ -3033,24 +3032,26 @@ static void srpt_release_cmd(struct se_cmd *se_cmd)
 static void srpt_close_session(struct se_session *se_sess)
 {
DECLARE_COMPLETION_ONSTACK(release_done);
-   struct srpt_rdma_ch *ch;
-   struct srpt_device *sdev;
-   unsigned long res;
-
-   ch = se_sess->fabric_sess_ptr;
-   WARN_ON(ch->sess != se_sess);
+   struct srpt_rdma_ch *ch = se_sess->fabric_sess_ptr;
+   struct srpt_device *sdev = ch->sport->sdev;
+   bool wait;
 
-   pr_debug("ch %p state %d\n", ch, ch->state);
+   pr_debug("ch %s-%d state %d\n", ch->sess_name, ch->qp->qp_num,
+ch->state);
 
-   sdev = ch->sport->sdev;
spin_lock_irq(>spinlock);
BUG_ON(ch->release_done);
ch->release_done = _done;
+   wait = !list_empty(>list);
__srpt_close_ch(ch);
spin_unlock_irq(>spinlock);
 
-   res = wait_for_completion_timeout(_done, 60 * HZ);
-   WARN_ON(res == 0);
+   if (!wait)
+   return;
+
+   while (wait_for_completion_timeout(_done, 180 * HZ) == 0)
+   pr_info("%s(%s-%d state %d): still waiting ...\n", __func__,
+   ch->sess_name, ch->qp->qp_num, ch->state);
 }
 
 /**
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/15] IB/srpt: Inline srpt_get_ch_state()

2016-01-05 Thread Bart Van Assche
The callers of srpt_get_ch_state() can access ch->state safely without
using locking. Hence inline this function.

Signed-off-by: Bart Van Assche 
Cc: Christoph Hellwig 
---
 drivers/infiniband/ulp/srpt/ib_srpt.c | 42 ++-
 1 file changed, 12 insertions(+), 30 deletions(-)

diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c 
b/drivers/infiniband/ulp/srpt/ib_srpt.c
index 7abad26..cbcc73e 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -109,17 +109,6 @@ enum dma_data_direction opposite_dma_dir(enum 
dma_data_direction dir)
}
 }
 
-static enum rdma_ch_state srpt_get_ch_state(struct srpt_rdma_ch *ch)
-{
-   unsigned long flags;
-   enum rdma_ch_state state;
-
-   spin_lock_irqsave(>spinlock, flags);
-   state = ch->state;
-   spin_unlock_irqrestore(>spinlock, flags);
-   return state;
-}
-
 static enum rdma_ch_state
 srpt_set_ch_state(struct srpt_rdma_ch *ch, enum rdma_ch_state new_state)
 {
@@ -216,7 +205,7 @@ static void srpt_srq_event(struct ib_event *event, void 
*ctx)
 static void srpt_qp_event(struct ib_event *event, struct srpt_rdma_ch *ch)
 {
pr_debug("QP event %d on cm_id=%p sess_name=%s state=%d\n",
-event->event, ch->cm_id, ch->sess_name, srpt_get_ch_state(ch));
+event->event, ch->cm_id, ch->sess_name, ch->state);
 
switch (event->event) {
case IB_EVENT_COMM_EST:
@@ -228,7 +217,7 @@ static void srpt_qp_event(struct ib_event *event, struct 
srpt_rdma_ch *ch)
srpt_release_channel(ch);
else
pr_debug("%s: state %d - ignored LAST_WQE.\n",
-ch->sess_name, srpt_get_ch_state(ch));
+ch->sess_name, ch->state);
break;
default:
pr_err("received unrecognized IB QP event %d\n", event->event);
@@ -1784,7 +1773,6 @@ static void srpt_handle_new_iu(struct srpt_rdma_ch *ch,
   struct srpt_send_ioctx *send_ioctx)
 {
struct srp_cmd *srp_cmd;
-   enum rdma_ch_state ch_state;
 
BUG_ON(!ch);
BUG_ON(!recv_ioctx);
@@ -1793,13 +1781,12 @@ static void srpt_handle_new_iu(struct srpt_rdma_ch *ch,
   recv_ioctx->ioctx.dma, srp_max_req_size,
   DMA_FROM_DEVICE);
 
-   ch_state = srpt_get_ch_state(ch);
-   if (unlikely(ch_state == CH_CONNECTING)) {
+   if (unlikely(ch->state == CH_CONNECTING)) {
list_add_tail(_ioctx->wait_list, >cmd_wait_list);
goto out;
}
 
-   if (unlikely(ch_state != CH_LIVE))
+   if (unlikely(ch->state != CH_LIVE))
goto out;
 
srp_cmd = recv_ioctx->ioctx.buf;
@@ -1908,7 +1895,7 @@ static void srpt_send_done(struct ib_cq *cq, struct ib_wc 
*wc)
 
 out:
while (!list_empty(>cmd_wait_list) &&
-  srpt_get_ch_state(ch) == CH_LIVE &&
+  ch->state == CH_LIVE &&
   (ioctx = srpt_get_send_ioctx(ch)) != NULL) {
struct srpt_recv_ioctx *recv_ioctx;
 
@@ -2314,17 +2301,14 @@ static int srpt_cm_req_recv(struct ib_cm_id *cm_id,
&& param->port == ch->sport->port
&& param->listen_id == ch->sport->sdev->cm_id
&& ch->cm_id) {
-   enum rdma_ch_state ch_state;
-
-   ch_state = srpt_get_ch_state(ch);
-   if (ch_state != CH_CONNECTING
-   && ch_state != CH_LIVE)
+   if (ch->state != CH_CONNECTING
+   && ch->state != CH_LIVE)
continue;
 
/* found an existing channel */
pr_debug("Found existing channel %s"
 " cm_id= %p state= %d\n",
-ch->sess_name, ch->cm_id, ch_state);
+ch->sess_name, ch->cm_id, ch->state);
 
__srpt_close_ch(ch);
 
@@ -2566,7 +2550,7 @@ static void srpt_cm_dreq_recv(struct ib_cm_id *cm_id)
ch = srpt_find_channel(cm_id->context, cm_id);
BUG_ON(!ch);
 
-   pr_debug("cm_id= %p ch->state= %d\n", cm_id, srpt_get_ch_state(ch));
+   pr_debug("cm_id= %p ch->state= %d\n", cm_id, ch->state);
 
spin_lock_irqsave(>spinlock, flags);
switch (ch->state) {
@@ -2750,7 +2734,6 @@ static int srpt_write_pending(struct se_cmd *se_cmd)
struct srpt_rdma_ch *ch;
struct srpt_send_ioctx *ioctx;
enum srpt_command_state new_state;
-   enum rdma_ch_state ch_state;
int ret;
 
ioctx = container_of(se_cmd, struct srpt_send_ioctx, 

[PATCH 13/15] IB/srpt: Detect session shutdown reliably

2016-01-05 Thread Bart Van Assche
The Last WQE Reached event is only generated after one or more work
requests have been queued on the QP associated with a session. Since
session shutdown can start before any work requests have been queued,
use a zero-length RDMA write to wait until a QP has been drained.

Signed-off-by: Bart Van Assche 
Cc: Christoph Hellwig 
---
 drivers/infiniband/ulp/srpt/ib_srpt.c | 274 +-
 drivers/infiniband/ulp/srpt/ib_srpt.h |  18 ++-
 2 files changed, 150 insertions(+), 142 deletions(-)

diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c 
b/drivers/infiniband/ulp/srpt/ib_srpt.c
index 6ec130d..cacb697 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -91,10 +91,11 @@ MODULE_PARM_DESC(srpt_service_guid,
 " instead of using the node_guid of the first HCA.");
 
 static struct ib_client srpt_client;
-static void srpt_release_channel(struct srpt_rdma_ch *ch);
+static void srpt_free_ch(struct kref *kref);
 static int srpt_queue_status(struct se_cmd *cmd);
 static void srpt_recv_done(struct ib_cq *cq, struct ib_wc *wc);
 static void srpt_send_done(struct ib_cq *cq, struct ib_wc *wc);
+static void srpt_zerolength_write_done(struct ib_cq *cq, struct ib_wc *wc);
 
 static bool srpt_set_ch_state(struct srpt_rdma_ch *ch, enum rdma_ch_state new)
 {
@@ -170,6 +171,23 @@ static void srpt_srq_event(struct ib_event *event, void 
*ctx)
pr_info("SRQ event %d\n", event->event);
 }
 
+static const char *get_ch_state_name(enum rdma_ch_state s)
+{
+   switch (s) {
+   case CH_CONNECTING:
+   return "connecting";
+   case CH_LIVE:
+   return "live";
+   case CH_DISCONNECTING:
+   return "disconnecting";
+   case CH_DRAINING:
+   return "draining";
+   case CH_DISCONNECTED:
+   return "disconnected";
+   }
+   return "???";
+}
+
 /**
  * srpt_qp_event() - QP event callback function.
  */
@@ -183,11 +201,9 @@ static void srpt_qp_event(struct ib_event *event, struct 
srpt_rdma_ch *ch)
ib_cm_notify(ch->cm_id, event->event);
break;
case IB_EVENT_QP_LAST_WQE_REACHED:
-   if (srpt_set_ch_state(ch, CH_RELEASING))
-   srpt_release_channel(ch);
-   else
-   pr_debug("%s: state %d - ignored LAST_WQE.\n",
-ch->sess_name, ch->state);
+   pr_debug("%s-%d, state %s: received Last WQE event.\n",
+ch->sess_name, ch->qp->qp_num,
+get_ch_state_name(ch->state));
break;
default:
pr_err("received unrecognized IB QP event %d\n", event->event);
@@ -789,6 +805,37 @@ out:
 }
 
 /**
+ * srpt_zerolength_write() - Perform a zero-length RDMA write.
+ *
+ * A quote from the InfiniBand specification: C9-88: For an HCA responder
+ * using Reliable Connection service, for each zero-length RDMA READ or WRITE
+ * request, the R_Key shall not be validated, even if the request includes
+ * Immediate data.
+ */
+static int srpt_zerolength_write(struct srpt_rdma_ch *ch)
+{
+   struct ib_send_wr wr, *bad_wr;
+
+   memset(, 0, sizeof(wr));
+   wr.opcode = IB_WR_RDMA_WRITE;
+   wr.wr_cqe = >zw_cqe;
+   wr.send_flags = IB_SEND_SIGNALED;
+   return ib_post_send(ch->qp, , _wr);
+}
+
+static void srpt_zerolength_write_done(struct ib_cq *cq, struct ib_wc *wc)
+{
+   struct srpt_rdma_ch *ch = cq->cq_context;
+
+   WARN(wc->status == IB_WC_SUCCESS, "%s-%d: QP not in error state\n",
+ch->sess_name, ch->qp->qp_num);
+   if (srpt_set_ch_state(ch, CH_DISCONNECTED))
+   schedule_work(>release_work);
+   else
+   WARN_ONCE("%s-%d\n", ch->sess_name, ch->qp->qp_num);
+}
+
+/**
  * srpt_get_desc_tbl() - Parse the data descriptors of an SRP_CMD request.
  * @ioctx: Pointer to the I/O context associated with the request.
  * @srp_cmd: Pointer to the SRP_CMD request data.
@@ -1819,111 +1866,102 @@ static void srpt_destroy_ch_ib(struct srpt_rdma_ch 
*ch)
 }
 
 /**
- * __srpt_close_ch() - Close an RDMA channel by setting the QP error state.
+ * srpt_close_ch() - Close an RDMA channel.
  *
- * Reset the QP and make sure all resources associated with the channel will
- * be deallocated at an appropriate time.
+ * Make sure all resources associated with the channel will be deallocated at
+ * an appropriate time.
  *
- * Note: The caller must hold ch->sport->sdev->spinlock.
+ * Returns true if and only if the channel state has been modified into
+ * CH_DRAINING.
  */
-static void __srpt_close_ch(struct srpt_rdma_ch *ch)
+static bool srpt_close_ch(struct srpt_rdma_ch *ch)
 {
-   enum rdma_ch_state prev_state;
-   unsigned long flags;
+   int ret;
 
-   spin_lock_irqsave(>spinlock, flags);
-   prev_state = ch->state;
-   switch (prev_state) {
- 

[PATCH 14/15] IB/srpt: Fix srpt_write_pending()

2016-01-05 Thread Bart Van Assche
The only allowed return values for the write_pending() callback
function are 0, -EAGAIN and -ENOMEM. Since attempting to perform
RDMA over a disconnecting channel will result in an IB error
completion anyway, remove the code that checks the channel state
from srpt_write_pending().

Signed-off-by: Bart Van Assche 
Cc: Christoph Hellwig 
---
 drivers/infiniband/ulp/srpt/ib_srpt.c | 33 -
 1 file changed, 4 insertions(+), 29 deletions(-)

diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c 
b/drivers/infiniband/ulp/srpt/ib_srpt.c
index cacb697..669ae5c 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -2522,39 +2522,14 @@ out_unmap:
  */
 static int srpt_write_pending(struct se_cmd *se_cmd)
 {
-   struct srpt_rdma_ch *ch;
-   struct srpt_send_ioctx *ioctx;
+   struct srpt_send_ioctx *ioctx =
+   container_of(se_cmd, struct srpt_send_ioctx, cmd);
+   struct srpt_rdma_ch *ch = ioctx->ch;
enum srpt_command_state new_state;
-   int ret;
-
-   ioctx = container_of(se_cmd, struct srpt_send_ioctx, cmd);
 
new_state = srpt_set_cmd_state(ioctx, SRPT_STATE_NEED_DATA);
WARN_ON(new_state == SRPT_STATE_DONE);
-
-   ch = ioctx->ch;
-   BUG_ON(!ch);
-
-   switch (ch->state) {
-   case CH_CONNECTING:
-   WARN(true, "unexpected channel state %d\n", ch->state);
-   ret = -EINVAL;
-   goto out;
-   case CH_LIVE:
-   break;
-   case CH_DISCONNECTING:
-   case CH_DRAINING:
-   case CH_DISCONNECTED:
-   pr_debug("cmd with tag %lld: channel disconnecting\n",
-ioctx->cmd.tag);
-   srpt_set_cmd_state(ioctx, SRPT_STATE_DATA_IN);
-   ret = -EINVAL;
-   goto out;
-   }
-   ret = srpt_xfer_data(ch, ioctx);
-
-out:
-   return ret;
+   return srpt_xfer_data(ch, ioctx);
 }
 
 static u8 tcm_to_srp_tsk_mgmt_status(const int tcm_mgmt_status)
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/15] IB/srpt: Introduce target_reverse_dma_direction()

2016-01-05 Thread Bart Van Assche
Use the function target_reverse_dma_direction() instead of
reimplementing it.

Signed-off-by: Bart Van Assche 
Cc: Christoph Hellwig 
---
 drivers/infiniband/ulp/srpt/ib_srpt.c | 17 ++---
 1 file changed, 2 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c 
b/drivers/infiniband/ulp/srpt/ib_srpt.c
index cbcc73e..fd94780 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -96,19 +96,6 @@ static int srpt_queue_status(struct se_cmd *cmd);
 static void srpt_recv_done(struct ib_cq *cq, struct ib_wc *wc);
 static void srpt_send_done(struct ib_cq *cq, struct ib_wc *wc);
 
-/**
- * opposite_dma_dir() - Swap DMA_TO_DEVICE and DMA_FROM_DEVICE.
- */
-static inline
-enum dma_data_direction opposite_dma_dir(enum dma_data_direction dir)
-{
-   switch (dir) {
-   case DMA_TO_DEVICE: return DMA_FROM_DEVICE;
-   case DMA_FROM_DEVICE:   return DMA_TO_DEVICE;
-   default:return dir;
-   }
-}
-
 static enum rdma_ch_state
 srpt_set_ch_state(struct srpt_rdma_ch *ch, enum rdma_ch_state new_state)
 {
@@ -1048,7 +1035,7 @@ static void srpt_unmap_sg_to_ib_sge(struct srpt_rdma_ch 
*ch,
dir = ioctx->cmd.data_direction;
BUG_ON(dir == DMA_NONE);
ib_dma_unmap_sg(ch->sport->sdev->device, sg, ioctx->sg_cnt,
-   opposite_dma_dir(dir));
+   target_reverse_dma_direction(>cmd));
ioctx->mapped_sg_count = 0;
}
 }
@@ -1085,7 +1072,7 @@ static int srpt_map_sg_to_ib_sge(struct srpt_rdma_ch *ch,
ioctx->sg_cnt = sg_cnt = cmd->t_data_nents;
 
count = ib_dma_map_sg(ch->sport->sdev->device, sg, sg_cnt,
- opposite_dma_dir(dir));
+ target_reverse_dma_direction(cmd));
if (unlikely(!count))
return -EAGAIN;
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 05/15] IB/srpt: Use scsilun_to_int()

2016-01-05 Thread Bart Van Assche
Just like other target drivers, use scsilun_to_int() to unpack SCSI
LUN numbers. This patch only changes the behavior of ib_srpt for LUN
numbers >= 16384.

Signed-off-by: Bart Van Assche 
Cc: Christoph Hellwig 
---
 drivers/infiniband/ulp/srpt/ib_srpt.c | 92 +++
 1 file changed, 6 insertions(+), 86 deletions(-)

diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c 
b/drivers/infiniband/ulp/srpt/ib_srpt.c
index fd94780..fc19203 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -1488,80 +1488,6 @@ static int srpt_build_tskmgmt_rsp(struct srpt_rdma_ch 
*ch,
return resp_len;
 }
 
-#define NO_SUCH_LUN ((uint64_t)-1LL)
-
-/*
- * SCSI LUN addressing method. See also SAM-2 and the section about
- * eight byte LUNs.
- */
-enum scsi_lun_addr_method {
-   SCSI_LUN_ADDR_METHOD_PERIPHERAL   = 0,
-   SCSI_LUN_ADDR_METHOD_FLAT = 1,
-   SCSI_LUN_ADDR_METHOD_LUN  = 2,
-   SCSI_LUN_ADDR_METHOD_EXTENDED_LUN = 3,
-};
-
-/*
- * srpt_unpack_lun() - Convert from network LUN to linear LUN.
- *
- * Convert an 2-byte, 4-byte, 6-byte or 8-byte LUN structure in network byte
- * order (big endian) to a linear LUN. Supports three LUN addressing methods:
- * peripheral, flat and logical unit. See also SAM-2, section 4.9.4 (page 40).
- */
-static uint64_t srpt_unpack_lun(const uint8_t *lun, int len)
-{
-   uint64_t res = NO_SUCH_LUN;
-   int addressing_method;
-
-   if (unlikely(len < 2)) {
-   pr_err("Illegal LUN length %d, expected 2 bytes or more\n",
-  len);
-   goto out;
-   }
-
-   switch (len) {
-   case 8:
-   if ((*((__be64 *)lun) &
-cpu_to_be64(0xLL)) != 0)
-   goto out_err;
-   break;
-   case 4:
-   if (*((__be16 *)[2]) != 0)
-   goto out_err;
-   break;
-   case 6:
-   if (*((__be32 *)[2]) != 0)
-   goto out_err;
-   break;
-   case 2:
-   break;
-   default:
-   goto out_err;
-   }
-
-   addressing_method = (*lun) >> 6; /* highest two bits of byte 0 */
-   switch (addressing_method) {
-   case SCSI_LUN_ADDR_METHOD_PERIPHERAL:
-   case SCSI_LUN_ADDR_METHOD_FLAT:
-   case SCSI_LUN_ADDR_METHOD_LUN:
-   res = *(lun + 1) | (((*lun) & 0x3f) << 8);
-   break;
-
-   case SCSI_LUN_ADDR_METHOD_EXTENDED_LUN:
-   default:
-   pr_err("Unimplemented LUN addressing method %u\n",
-  addressing_method);
-   break;
-   }
-
-out:
-   return res;
-
-out_err:
-   pr_err("Support for multi-level LUNs has not yet been implemented\n");
-   goto out;
-}
-
 static int srpt_check_stop_free(struct se_cmd *cmd)
 {
struct srpt_send_ioctx *ioctx = container_of(cmd,
@@ -1579,7 +1505,6 @@ static int srpt_handle_cmd(struct srpt_rdma_ch *ch,
 {
struct se_cmd *cmd;
struct srp_cmd *srp_cmd;
-   uint64_t unpacked_lun;
u64 data_len;
enum dma_data_direction dir;
sense_reason_t ret;
@@ -1614,11 +1539,10 @@ static int srpt_handle_cmd(struct srpt_rdma_ch *ch,
goto send_sense;
}
 
-   unpacked_lun = srpt_unpack_lun((uint8_t *)_cmd->lun,
-  sizeof(srp_cmd->lun));
rc = target_submit_cmd(cmd, ch->sess, srp_cmd->cdb,
-   _ioctx->sense_data[0], unpacked_lun, data_len,
-   TCM_SIMPLE_TAG, dir, TARGET_SCF_ACK_KREF);
+  _ioctx->sense_data[0],
+  scsilun_to_int(_cmd->lun), data_len,
+  TCM_SIMPLE_TAG, dir, TARGET_SCF_ACK_KREF);
if (rc != 0) {
ret = TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
goto send_sense;
@@ -1704,7 +1628,6 @@ static void srpt_handle_tsk_mgmt(struct srpt_rdma_ch *ch,
struct srp_tsk_mgmt *srp_tsk;
struct se_cmd *cmd;
struct se_session *sess = ch->sess;
-   uint64_t unpacked_lun;
uint32_t tag = 0;
int tcm_tmr;
int rc;
@@ -1726,9 +1649,6 @@ static void srpt_handle_tsk_mgmt(struct srpt_rdma_ch *ch,
TMR_TASK_MGMT_FUNCTION_NOT_SUPPORTED;
goto fail;
}
-   unpacked_lun = srpt_unpack_lun((uint8_t *)_tsk->lun,
-  sizeof(srp_tsk->lun));
-
if (srp_tsk->tsk_mgmt_func == SRP_TSK_ABORT_TASK) {
rc = srpt_rx_mgmt_fn_tag(send_ioctx, srp_tsk->task_tag);
if (rc < 0) {
@@ -1738,9 +1658,9 @@ static void srpt_handle_tsk_mgmt(struct srpt_rdma_ch *ch,
}
tag = srp_tsk->task_tag;
}
-   rc = target_submit_tmr(_ioctx->cmd, sess, NULL, 

RE: [PATCHv3 TRIVIAL] IB/core: Documentation fix to ib_mad_snoop_handler in the MAD header file

2016-01-05 Thread Hefty, Sean
> > There are no in tree users and the only attempt at adding one was
> rejected.
> 
> There are no in tree users of this but there is your madeye tool (which
> is out of tree). This is still a useful debug tool for MADs and there
> are people who still use that.

It's an out of tree tool.  Maintain the snooping hooks into the mad layer out 
of tree as well.


[PATCHv1 0/6] rdma controller support

2016-01-05 Thread Parav Pandit
This patchset adds support for RDMA cgroup by addressing review comments
of [1] by implementing published RFC [2].

Overview:
Currently user space applications can easily take away all the rdma
device specific resources such as AH, CQ, QP, MR etc. Due to which other
applications in other cgroup or kernel space ULPs may not even get chance
to allocate any rdma resources. This results into service unavailibility.

RDMA cgroup addresses this issue by allowing resource accounting,
limit enforcement on per cgroup, per rdma device basis.

Resources are not defined by the RDMA cgroup. Resources are defined
by RDMA/IB stack & optionally by HCA vendor device drivers.
This allows rdma cgroup to remain constant while RDMA/IB
stack can evolve without the need of rdma cgroup update. A new
resource can be easily added by the RDMA/IB stack without touching
rdma cgroup.

RDMA uverbs layer will enforce limits on well defined RDMA verb
resources without any HCA vendor device driver involvement.

RDMA uverbs layer will not do accounting of hw vendor specific resources.
Instead rdma cgroup provides set of APIs through which vendor specific 
drivers can define their own resources (upto 64) that can be accounted by
rdma cgroup.

Resource limit enforcement is hierarchical.

When process is migrated with active RDMA resources, rdma cgroup
continues to charge original cgroup.

Changes from v0:
(To address comments from Haggai, Doug, Liran, Tejun, Sean, Jason)
 * Redesigned to support per device per cgroup limit settings by bringing
   concept of resource pool.
 * Redesigned to let IB stack define the resources instead of rdma controller
   using resource template.
 * Redesigned to support hw vendor specific limits setting (optional to 
drivers).
 * Created new rdma controller instead of piggyback on device cgroup.
 * Fixed race conditions for multiple tasks sharing rdma resources.
 * Removed dependency on the task_struct.

[1] https://lkml.org/lkml/2015/9/7/476
[2] https://lkml.org/lkml/2015/10/28/144

This patchset is for Tejun's for-4.5 branch.
It is not attempted on Doug's rdma tree yet, which I will do once I receive
comments for this pathset.

Parav Pandit (6):
  rdmacg: Added rdma cgroup header file
  IB/core: Added members to support rdma cgroup
  rdmacg: implements rdma cgroup
  IB/core: rdmacg support infrastructure APIs
  IB/core: use rdma cgroup for resource accounting
  rdmacg: Added documentation for rdma controller.

 Documentation/cgroup-legacy/rdma.txt  |  129 
 Documentation/cgroup.txt  |   79 +++
 drivers/infiniband/core/Makefile  |1 +
 drivers/infiniband/core/cgroup.c  |   80 +++
 drivers/infiniband/core/core_priv.h   |5 +
 drivers/infiniband/core/device.c  |8 +
 drivers/infiniband/core/uverbs_cmd.c  |  244 ++-
 drivers/infiniband/core/uverbs_main.c |   30 +
 include/linux/cgroup_rdma.h   |   91 +++
 include/linux/cgroup_subsys.h |4 +
 include/rdma/ib_verbs.h   |   20 +
 init/Kconfig  |   12 +
 kernel/Makefile   |1 +
 kernel/cgroup_rdma.c  | 1220 +
 14 files changed, 1907 insertions(+), 17 deletions(-)
 create mode 100644 Documentation/cgroup-legacy/rdma.txt
 create mode 100644 drivers/infiniband/core/cgroup.c
 create mode 100644 include/linux/cgroup_rdma.h
 create mode 100644 kernel/cgroup_rdma.c

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv1 3/6] rdmacg: implements rdma cgroup

2016-01-05 Thread Parav Pandit
Adds RDMA controller to limit the number of RDMA resources that can be
consumed by processes of a rdma cgroup.

RDMA resources are global resource that can be exhauasted without
reaching any kmemcg or other policy. RDMA cgroup implementation allows
limiting RDMA/IB well defined resources to be limited per cgroup.

RDMA resources are tracked using resource pool. Resource pool is per
device, per cgroup, per resource pool_type entity which allows setting
up accounting limits on per device basis.

RDMA cgroup returns error when user space applications try to allocate
resources more than its configured limit.

Rdma cgroup implements resource accounting for two types of resource
pools.
(a) RDMA IB specification level verb resources defined by IB stack
(b) HCA vendor device specific resources defined by vendor device driver

Resources are not defined by the RDMA cgroup, instead they are defined
by the external module, typically IB stack and optionally by HCA drivers
for those RDMA devices which doesn't have one to one mapping of IB verb
resource with hardware resource.

Signed-off-by: Parav Pandit 
---
 include/linux/cgroup_subsys.h |4 +
 init/Kconfig  |   12 +
 kernel/Makefile   |1 +
 kernel/cgroup_rdma.c  | 1220 +
 4 files changed, 1237 insertions(+)
 create mode 100644 kernel/cgroup_rdma.c

diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
index 0df0336a..d0e597c 100644
--- a/include/linux/cgroup_subsys.h
+++ b/include/linux/cgroup_subsys.h
@@ -56,6 +56,10 @@ SUBSYS(hugetlb)
 SUBSYS(pids)
 #endif
 
+#if IS_ENABLED(CONFIG_CGROUP_RDMA)
+SUBSYS(rdma)
+#endif
+
 /*
  * The following subsystems are not supported on the default hierarchy.
  */
diff --git a/init/Kconfig b/init/Kconfig
index f8754f5..f8055f5 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1070,6 +1070,18 @@ config CGROUP_PIDS
  since the PIDs limit only affects a process's ability to fork, not to
  attach to a cgroup.
 
+config CGROUP_RDMA
+   bool "RDMA controller"
+   help
+ Provides enforcement of RDMA resources at RDMA/IB verb level and
+ enforcement of any RDMA/IB capable hardware advertized resources.
+ Its fairly easy for applications to exhaust RDMA resources, which
+ can result into kernel consumers or other application consumers of
+ RDMA resources left with no resources. RDMA controller is designed
+ to stop this from happening.
+ Attaching existing processes with active RDMA resources to the cgroup
+ hierarchy will be allowed even if can cross the hierarchy's limit.
+
 config CGROUP_FREEZER
bool "Freezer controller"
help
diff --git a/kernel/Makefile b/kernel/Makefile
index 53abf00..26e413c 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -57,6 +57,7 @@ obj-$(CONFIG_COMPAT) += compat.o
 obj-$(CONFIG_CGROUPS) += cgroup.o
 obj-$(CONFIG_CGROUP_FREEZER) += cgroup_freezer.o
 obj-$(CONFIG_CGROUP_PIDS) += cgroup_pids.o
+obj-$(CONFIG_CGROUP_RDMA) += cgroup_rdma.o
 obj-$(CONFIG_CPUSETS) += cpuset.o
 obj-$(CONFIG_UTS_NS) += utsname.o
 obj-$(CONFIG_USER_NS) += user_namespace.o
diff --git a/kernel/cgroup_rdma.c b/kernel/cgroup_rdma.c
new file mode 100644
index 000..14c6fab
--- /dev/null
+++ b/kernel/cgroup_rdma.c
@@ -0,0 +1,1220 @@
+/*
+ * This file is subject to the terms and conditions of version 2 of the GNU
+ * General Public License.  See the file COPYING in the main directory of the
+ * Linux distribution for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+enum rdmacg_file_type {
+   RDMACG_VERB_RESOURCE_LIMIT,
+   RDMACG_VERB_RESOURCE_USAGE,
+   RDMACG_VERB_RESOURCE_FAILCNT,
+   RDMACG_VERB_RESOURCE_LIST,
+   RDMACG_HW_RESOURCE_LIMIT,
+   RDMACG_HW_RESOURCE_USAGE,
+   RDMACG_HW_RESOURCE_FAILCNT,
+   RDMACG_HW_RESOURCE_LIST,
+};
+
+#define RDMACG_USR_CMD_REMOVE "remove"
+
+/* resource tracker per resource for rdma cgroup */
+struct cg_resource {
+   atomic_t usage;
+   int limit;
+   atomic_t failcnt;
+};
+
+/**
+ * pool type indicating either it got created as part of default
+ * operation or user has configured the group.
+ * Depends on the creator of the pool, its decided to free up
+ * later or not.
+ */
+enum rpool_creator {
+   RDMACG_RPOOL_CREATOR_DEFAULT,
+   RDMACG_RPOOL_CREATOR_USR,
+};
+
+/**
+ * resource pool object which represents, per cgroup, per device,
+ * per resource pool_type resources.
+ */
+struct cg_resource_pool {
+   struct list_head cg_list;
+   struct ib_device *device;
+   enum rdmacg_resource_pool_type type;
+
+   struct cg_resource *resources;
+
+   atomic_t refcnt;/* count active user tasks of this pool */
+   atomic_t creator;   /* user created or default type */
+};
+
+struct 

[PATCHv1 2/6] IB/core: Added members to support rdma cgroup

2016-01-05 Thread Parav Pandit
Added function pointer table to store resource pool specific
operation for each resource type (verb and hw).
Added list node to link device to rdma cgroup so that it can
participate in resource accounting and limit configuration.

Signed-off-by: Parav Pandit 
---
 include/rdma/ib_verbs.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 9a68a19..1a17249 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -51,6 +51,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -1823,6 +1824,12 @@ struct ib_device {
u8   node_type;
u8   phys_port_cnt;
 
+#ifdef CONFIG_CGROUP_RDMA
+   struct rdmacg_resource_pool_ops
+   *rpool_ops[RDMACG_RESOURCE_POOL_TYPE_MAX];
+   struct list_head rdmacg_list;
+#endif
+
/**
 * The following mandatory functions are used only at device
 * registration.  Keep functions such as these at the end of this
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv4 TRIVIAL] IB/core: Documentation fix to the snoop handler in the MAD header file

2016-01-05 Thread Hal Rosenstock
In ib_mad.h, ib_mad_snoop_handler uses send_buf rather than send_wr

Signed-off-by: Hal Rosenstock 
---
Change since v3:
Fixed title to not include function name

Change since v2:
Changed title to use "higher" language

Change since v1:
Fixed typo in patch description

diff --git a/include/rdma/ib_mad.h b/include/rdma/ib_mad.h
index ec9b44d..2b3573d 100644
--- a/include/rdma/ib_mad.h
+++ b/include/rdma/ib_mad.h
@@ -424,11 +424,11 @@ typedef void (*ib_mad_send_handler)(struct ib_mad_agent 
*mad_agent,
 /**
  * ib_mad_snoop_handler - Callback handler for snooping sent MADs.
  * @mad_agent: MAD agent that snooped the MAD.
- * @send_wr: Work request information on the sent MAD.
+ * @send_buf: send MAD data buffer.
  * @mad_send_wc: Work completion information on the sent MAD.  Valid
  *   only for snooping that occurs on a send completion.
  *
- * Clients snooping MADs should not modify data referenced by the @send_wr
+ * Clients snooping MADs should not modify data referenced by the @send_buf
  * or @mad_send_wc.
  */
 typedef void (*ib_mad_snoop_handler)(struct ib_mad_agent *mad_agent,
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv1 1/6] rdmacg: Added rdma cgroup header file

2016-01-05 Thread Parav Pandit
Added rdma cgroup header file which defines its APIs to perform
charing/uncharing functionality.

Signed-off-by: Parav Pandit 
---
 include/linux/cgroup_rdma.h | 91 +
 1 file changed, 91 insertions(+)
 create mode 100644 include/linux/cgroup_rdma.h

diff --git a/include/linux/cgroup_rdma.h b/include/linux/cgroup_rdma.h
new file mode 100644
index 000..01d220f
--- /dev/null
+++ b/include/linux/cgroup_rdma.h
@@ -0,0 +1,91 @@
+#ifndef _CGROUP_RDMA_H
+#define _CGROUP_RDMA_H
+
+/*
+ * This file is subject to the terms and conditions of version 2 of the GNU
+ * General Public License.  See the file COPYING in the main directory of the
+ * Linux distribution for more details.
+ */
+
+enum rdmacg_resource_pool_type {
+   RDMACG_RESOURCE_POOL_VERB,
+   RDMACG_RESOURCE_POOL_HW,
+   RDMACG_RESOURCE_POOL_TYPE_MAX,
+};
+
+struct ib_device;
+struct pid;
+struct match_token;
+
+#ifdef CONFIG_CGROUP_RDMA
+#define RDMACG_MAX_RESOURCE_INDEX (64)
+
+struct rdmacg_pool_info {
+   struct match_token *resource_table;
+   int resource_count;
+};
+
+struct rdmacg_resource_pool_ops {
+   struct rdmacg_pool_info*
+   (*get_resource_pool_tokens)(struct ib_device *);
+};
+
+/* APIs for RDMA/IB subsystem to publish when a device wants to
+ * participate in resource accounting
+ */
+void rdmacg_register_ib_device(struct ib_device *device);
+void rdmacg_unregister_ib_device(struct ib_device *device);
+
+/* APIs for RDMA/IB subsystem to charge/uncharge pool specific resources */
+int rdmacg_try_charge_resource(struct ib_device *device,
+  struct pid *pid,
+  enum rdmacg_resource_pool_type type,
+  int resource_index,
+  int num);
+void rdmacg_uncharge_resource(struct ib_device *device,
+ struct pid *pid,
+ enum rdmacg_resource_pool_type type,
+ int resource_index,
+ int num);
+
+void rdmacg_set_rpool_ops(struct ib_device *device,
+ enum rdmacg_resource_pool_type pool_type,
+ struct rdmacg_resource_pool_ops *ops);
+void rdmacg_clear_rpool_ops(struct ib_device *device,
+   enum rdmacg_resource_pool_type pool_type);
+int rdmacg_query_resource_limit(struct ib_device *device,
+   struct pid *pid,
+   enum rdmacg_resource_pool_type type,
+   int *limits, int max_count);
+#else
+/* APIs for RDMA/IB subsystem to charge/uncharge device specific resources */
+static inline
+int rdmacg_try_charge_resource(struct ib_device *device,
+  struct pid *pid,
+  enum rdmacg_resource_pool_type type,
+  int resource_index,
+  int num)
+{ return 0; }
+
+static inline void rdmacg_uncharge_resource(struct ib_device *device,
+   struct pid *pid,
+   enum rdmacg_resource_pool_type type,
+   int resource_index,
+   int num)
+{ }
+
+static inline
+int rdmacg_query_resource_limit(struct ib_device *device,
+   struct pid *pid,
+   enum rdmacg_resource_pool_type type,
+   int *limits, int max_count)
+{
+   int i;
+
+   for (i = 0; i < max_count; i++)
+   limits[i] = S32_MAX;
+
+   return 0;
+}
+#endif /* CONFIG_CGROUP_RDMA */
+#endif /* _CGROUP_RDMA_H */
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3 TRIVIAL] IB/core: Documentation fix to ib_mad_snoop_handler in the MAD header file

2016-01-05 Thread Hal Rosenstock
On 1/5/2016 12:38 PM, Hefty, Sean wrote:
>> In ib_mad.h, ib_mad_snoop_handler uses send_buf rather than send_wr
> 
> The MAD snooping should be removed from the mad stack.  

This last discussed on linux-rdma list back in late September when Ira
posted a partial RFC patch to do this.

> There are no in tree users and the only attempt at adding one was rejected.

There are no in tree users of this but there is your madeye tool (which
is out of tree). This is still a useful debug tool for MADs and there
are people who still use that.

Ira posted the start of MAD stack tracing but AFAIT it was not
equivalent and that thread died out.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] staging/rdma/hfi1: check for ARMED->ACTIVE transition in receive interrupt

2016-01-05 Thread Jubin John
On Mon, Jan 04, 2016 at 10:23:46PM +0200, Leon Romanovsky wrote:
> On Mon, Jan 04, 2016 at 11:21:19AM -0500, Jubin John wrote:
> > From: Jim Snow 
> > 
> > } else {
> > +   /* Auto activate link on non-SC15 packet receive */
> > +   if (unlikely(rcd->ppd->host_link_state ==
> > +HLS_UP_ARMED))
> > +   if (set_armed_to_active(rcd, packet, dd))
> > +   goto bail;
> 
> What is the advantage of double "if" over one "if"?
> Something like that
> + if (unlikely(rcd->ppd->host_link_state == HLS_UP_ARMED) && 
> (set_armed_to_active(rcd, packet, dd))
> + goto bail;

I don't think there is an advantage to the double if, so I will change
it to the single if in v3.

> 
> > last = process_rcv_packet(, thread);
> > }
> >  
> > @@ -984,6 +1020,42 @@ bail:
> >  }
> >  
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv1 5/6] IB/core: use rdma cgroup for resource accounting

2016-01-05 Thread Parav Pandit
It uses charge API to perform resource charing before allocating low
level resource. It continues to link the resource to the owning
thread group leader task.
It uncharges the resource after successful deallocation of resource.

Signed-off-by: Parav Pandit 
---
 drivers/infiniband/core/device.c  |   8 ++
 drivers/infiniband/core/uverbs_cmd.c  | 244 +++---
 drivers/infiniband/core/uverbs_main.c |  30 +
 3 files changed, 265 insertions(+), 17 deletions(-)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 179e813..59cab6b 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -352,6 +352,10 @@ int ib_register_device(struct ib_device *device,
goto out;
}
 
+#ifdef CONFIG_CGROUP_RDMA
+   ib_device_register_rdmacg(device);
+#endif
+
ret = ib_device_register_sysfs(device, port_callback);
if (ret) {
printk(KERN_WARNING "Couldn't register device %s with driver 
model\n",
@@ -405,6 +409,10 @@ void ib_unregister_device(struct ib_device *device)
 
mutex_unlock(_mutex);
 
+#ifdef CONFIG_CGROUP_RDMA
+   ib_device_unregister_rdmacg(device);
+#endif
+
ib_device_unregister_sysfs(device);
ib_cache_cleanup_one(device);
 
diff --git a/drivers/infiniband/core/uverbs_cmd.c 
b/drivers/infiniband/core/uverbs_cmd.c
index 94816ae..1b3d60b 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -294,6 +294,7 @@ ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file,
 #endif
struct ib_ucontext   *ucontext;
struct file  *filp;
+   struct pid   *tgid;
int ret;
 
if (out_len < sizeof resp)
@@ -313,10 +314,20 @@ ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file,
   (unsigned long) cmd.response + sizeof resp,
   in_len - sizeof cmd, out_len - sizeof resp);
 
+   rcu_read_lock();
+   tgid = get_task_pid(current->group_leader, PIDTYPE_PID);
+   rcu_read_unlock();
+
+   ret = rdmacg_try_charge_resource(ib_dev, tgid,
+RDMACG_RESOURCE_POOL_VERB,
+RDMA_VERB_RESOURCE_UCTX, 1);
+   if (ret)
+   goto err_charge;
+
ucontext = ib_dev->alloc_ucontext(ib_dev, );
if (IS_ERR(ucontext)) {
ret = PTR_ERR(ucontext);
-   goto err;
+   goto err_alloc;
}
 
ucontext->device = ib_dev;
@@ -330,7 +341,7 @@ ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file,
INIT_LIST_HEAD(>xrcd_list);
INIT_LIST_HEAD(>rule_list);
rcu_read_lock();
-   ucontext->tgid = get_task_pid(current->group_leader, PIDTYPE_PID);
+   ucontext->tgid = tgid;
rcu_read_unlock();
ucontext->closing = 0;
 
@@ -383,9 +394,15 @@ err_fd:
put_unused_fd(resp.async_fd);
 
 err_free:
-   put_pid(ucontext->tgid);
ib_dev->dealloc_ucontext(ucontext);
 
+err_alloc:
+   rdmacg_uncharge_resource(ib_dev, tgid, RDMACG_RESOURCE_POOL_VERB,
+RDMA_VERB_RESOURCE_UCTX, 1);
+
+err_charge:
+   put_pid(tgid);
+
 err:
mutex_unlock(>mutex);
return ret;
@@ -394,7 +411,8 @@ err:
 static void copy_query_dev_fields(struct ib_uverbs_file *file,
  struct ib_device *ib_dev,
  struct ib_uverbs_query_device_resp *resp,
- struct ib_device_attr *attr)
+ struct ib_device_attr *attr,
+ int *limits)
 {
resp->fw_ver= attr->fw_ver;
resp->node_guid = ib_dev->node_guid;
@@ -405,14 +423,19 @@ static void copy_query_dev_fields(struct ib_uverbs_file 
*file,
resp->vendor_part_id= attr->vendor_part_id;
resp->hw_ver= attr->hw_ver;
resp->max_qp= attr->max_qp;
+   resp->max_qp= min_t(int, attr->max_qp,
+   limits[RDMA_VERB_RESOURCE_QP]);
resp->max_qp_wr = attr->max_qp_wr;
resp->device_cap_flags  = attr->device_cap_flags;
resp->max_sge   = attr->max_sge;
resp->max_sge_rd= attr->max_sge_rd;
-   resp->max_cq= attr->max_cq;
+   resp->max_cq= min_t(int, attr->max_cq,
+   limits[RDMA_VERB_RESOURCE_CQ]);
resp->max_cqe   = attr->max_cqe;
-   resp->max_mr= attr->max_mr;
-   resp->max_pd= attr->max_pd;
+   resp->max_mr= min_t(int, attr->max_mr,
+   limits[RDMA_VERB_RESOURCE_MR]);
+   resp->max_pd= min_t(int, attr->max_pd,
+   

[PATCHv1 4/6] IB/core: rdmacg support infrastructure APIs

2016-01-05 Thread Parav Pandit
It defines verb RDMA resources that will be registered with
RDMA cgroup. It defines new APIs to register device with
RDMA cgroup and defines resource token table access interface.

Signed-off-by: Parav Pandit 
---
 drivers/infiniband/core/Makefile|  1 +
 drivers/infiniband/core/cgroup.c| 80 +
 drivers/infiniband/core/core_priv.h |  5 +++
 include/rdma/ib_verbs.h | 13 ++
 4 files changed, 99 insertions(+)
 create mode 100644 drivers/infiniband/core/cgroup.c

diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index d43a899..df40cee 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -13,6 +13,7 @@ ib_core-y :=  packer.o ud_header.o verbs.o 
sysfs.o \
roce_gid_mgmt.o
 ib_core-$(CONFIG_INFINIBAND_USER_MEM) += umem.o
 ib_core-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o umem_rbtree.o
+ib_core-$(CONFIG_CGROUP_RDMA) += cgroup.o
 
 ib_mad-y :=mad.o smi.o agent.o mad_rmpp.o
 
diff --git a/drivers/infiniband/core/cgroup.c b/drivers/infiniband/core/cgroup.c
new file mode 100644
index 000..8d80add
--- /dev/null
+++ b/drivers/infiniband/core/cgroup.c
@@ -0,0 +1,80 @@
+#include 
+#include 
+#include 
+
+#include "core_priv.h"
+
+/**
+ * resource table definition as to be seen by the user.
+ * Need to add entries to it when more resources are
+ * added/defined at IB verb/core layer.
+ */
+static match_table_t resource_tokens = {
+   {RDMA_VERB_RESOURCE_UCTX, "uctx=%d"},
+   {RDMA_VERB_RESOURCE_AH, "ah=%d"},
+   {RDMA_VERB_RESOURCE_PD, "pd=%d"},
+   {RDMA_VERB_RESOURCE_CQ, "cq=%d"},
+   {RDMA_VERB_RESOURCE_MR, "mr=%d"},
+   {RDMA_VERB_RESOURCE_MW, "mw=%d"},
+   {RDMA_VERB_RESOURCE_SRQ, "srq=%d"},
+   {RDMA_VERB_RESOURCE_QP, "qp=%d"},
+   {RDMA_VERB_RESOURCE_FLOW, "flow=%d"},
+   {-1, NULL}
+};
+
+/**
+ * setup table pointers for RDMA cgroup to access.
+ */
+static struct rdmacg_pool_info verbs_token_info = {
+   .resource_table = resource_tokens,
+   .resource_count =
+   (sizeof(resource_tokens) / sizeof(struct match_token)) - 1,
+};
+
+static struct rdmacg_pool_info*
+   rdmacg_get_resource_pool_tokens(struct ib_device *device)
+{
+   return _token_info;
+}
+
+static struct rdmacg_resource_pool_ops verbs_pool_ops = {
+   .get_resource_pool_tokens = _get_resource_pool_tokens,
+};
+
+/**
+ * ib_device_register_rdmacg - register with rdma cgroup.
+ * @device: device to register to participate in resource
+ *  accounting by rdma cgroup.
+ *
+ * Register with the rdma cgroup. Should be called before
+ * exposing rdma device to user space applications to avoid
+ * resource accounting leak.
+ * HCA drivers should set resource pool ops first if they wish
+ * to support hw specific resource accounting before IB core
+ * registers with rdma cgroup.
+ */
+void ib_device_register_rdmacg(struct ib_device *device)
+{
+   rdmacg_set_rpool_ops(device,
+RDMACG_RESOURCE_POOL_VERB,
+_pool_ops);
+   rdmacg_register_ib_device(device);
+}
+
+/**
+ * ib_device_unregister_rdmacg - unregister with rdma cgroup.
+ * @device: device to unregister.
+ *
+ * Unregister with the rdma cgroup. Should be called after
+ * all the resources are deallocated, and after a stage when any
+ * other resource allocation of user application cannot be done
+ * for this device to avoid any leak in accounting.
+ * HCA drivers should clear resource pool ops after ib stack
+ * unregisters with rdma cgroup.
+ */
+void ib_device_unregister_rdmacg(struct ib_device *device)
+{
+   rdmacg_unregister_ib_device(device);
+   rdmacg_clear_rpool_ops(device,
+  RDMACG_RESOURCE_POOL_VERB);
+}
diff --git a/drivers/infiniband/core/core_priv.h 
b/drivers/infiniband/core/core_priv.h
index 5cf6eb7..29bdfe2 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -92,4 +92,9 @@ int ib_cache_setup_one(struct ib_device *device);
 void ib_cache_cleanup_one(struct ib_device *device);
 void ib_cache_release_one(struct ib_device *device);
 
+#ifdef CONFIG_CGROUP_RDMA
+void ib_device_register_rdmacg(struct ib_device *device);
+void ib_device_unregister_rdmacg(struct ib_device *device);
+#endif
+
 #endif /* _CORE_PRIV_H */
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 1a17249..f44b884 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -96,6 +96,19 @@ enum rdma_protocol_type {
RDMA_PROTOCOL_USNIC_UDP
 };
 
+enum rdma_resource_type {
+   RDMA_VERB_RESOURCE_UCTX,
+   RDMA_VERB_RESOURCE_AH,
+   RDMA_VERB_RESOURCE_PD,
+   RDMA_VERB_RESOURCE_CQ,
+   RDMA_VERB_RESOURCE_MR,
+   RDMA_VERB_RESOURCE_MW,
+   RDMA_VERB_RESOURCE_SRQ,
+   RDMA_VERB_RESOURCE_QP,
+   

[PATCHv1 6/6] rdmacg: Added documentation for rdma controller.

2016-01-05 Thread Parav Pandit
Added documentation for rdma controller to use in legacy mode and
using new unified hirerchy.

Signed-off-by: Parav Pandit 
---
 Documentation/cgroup-legacy/rdma.txt | 129 +++
 Documentation/cgroup.txt |  79 +
 2 files changed, 208 insertions(+)
 create mode 100644 Documentation/cgroup-legacy/rdma.txt

diff --git a/Documentation/cgroup-legacy/rdma.txt 
b/Documentation/cgroup-legacy/rdma.txt
new file mode 100644
index 000..70626c5
--- /dev/null
+++ b/Documentation/cgroup-legacy/rdma.txt
@@ -0,0 +1,129 @@
+   RDMA Resource Controller
+   
+
+Contents
+
+
+1. Overview
+  1-1. What is RDMA resource controller?
+  1-2. Why RDMA resource controller needed?
+  1-3. How is RDMA resource controller implemented?
+2. Usage Examples
+
+1. Overview
+
+1-1. What is RDMA resource controller?
+-
+
+RDMA resource controller allows user to limit RDMA/IB specific resources
+that a given set of processes can use. These processes are grouped using
+RDMA resource controller.
+
+RDMA resource controller currently allows two different type of resource
+pools.
+(a) RDMA IB specification level verb resources defined by IB stack
+(b) HCA vendor device specific resources
+
+RDMA resource controller controller allows maximum of upto 64 resources in
+a resource pool which is the internal construct of rdma cgroup explained
+at later part of this document.
+
+1-2. Why RDMA resource controller needed?
+
+
+Currently user space applications can easily take away all the rdma device
+specific resources such as AH, CQ, QP, MR etc. Due to which other applications
+in other cgroup or kernel space ULPs may not even get chance to allocate any
+rdma resources. This leads to service unavailability.
+
+Therefore RDMA resource controller is needed through which resource consumption
+of processes can be limited. Through this controller various different rdma
+resources described by IB uverbs layer and any HCA vendor driver can be
+accounted.
+
+1-3. How is RDMA resource controller implemented?
+
+
+rdma cgroup allows limit configuration of resources. These resources are not
+defined by the rdma controller. Instead they are defined by the IB stack
+and HCA device drivers(optionally).
+This provides great flexibility to allow IB stack to define new resources,
+without any changes to rdma cgroup.
+Rdma cgroup maintains resource accounting per cgroup, per device, per resource
+type using resource pool structure. Each such resource pool is limited up to
+64 resources in given resource pool by rdma cgroup, which can be extended
+later if required.
+
+This resource pool object is linked to the cgroup css. Typically there
+are 0 to 4 resource pool instances per cgroup, per device in most use cases.
+But nothing limits to have it more. At present hundreds of RDMA devices per
+single cgroup may not be handled optimally, however there is no known use case
+for such configuration either.
+
+Since RDMA resources can be allocated from any process and can be freed by any
+of the child processes which shares the address space, rdma resources are
+always owned by the creator cgroup css. This allows process migration from one
+to other cgroup without major complexity of transferring resource ownership;
+because such ownership is not really present due to shared nature of
+rdma resources. Linking resources around css also ensures that cgroups can be
+deleted after processes migrated. This allow progress migration as well with
+active resources, even though that’s not the primary use case.
+
+Finally mapping of the resource owner pid to cgroup is maintained using
+simple hash table to perform quick look-up during resource charing/uncharging
+time.
+
+Resource pool object is created in following situations.
+(a) User sets the limit and no previous resource pool exist for the device
+of interest for the cgroup.
+(b) No resource limits were configured, but IB/RDMA stack tries to
+charge the resource. So that it correctly uncharge them when applications are
+running without limits and later on when limits are enforced during uncharging,
+otherwise usage count will drop to negative. This is done using default
+resource pool. Instead of implementing any sort of time markers, default pool
+simplifies the design.
+
+Resource pool is destroyed if it was of default type (not created
+by administrative operation) and it’s the last resource getting
+deallocated. Resource pool created as administrative operation is not
+deleted, as it’s expected to be used in near future.
+
+If user setting tries to delete all the resource limit
+with active resources per device, RDMA cgroup just marks the pool as
+default pool with maximum limits for each resource, otherwise it deletes the
+default 

[PATCH v3] staging/rdma/hfi1: check for ARMED->ACTIVE transition in receive interrupt

2016-01-05 Thread Jubin John
From: Jim Snow 

The link state will transition from ARMED to ACTIVE when a non-SC15
packet arrives, but the driver might not notice the change.  With this
fix, if the slowpath receive interrupt handler sees a non-SC15 packet
while in the ARMED state, we queue work to call linkstate_active_work
from process context to promote it to ACTIVE.

Reviewed-by: Dean Luick 
Reviewed-by: Ira Weiny 
Reviewed-by: Mike Marciniszyn 
Signed-off-by: Jim Snow 
Signed-off-by: Brendan Cunningham 
Signed-off-by: Jubin John 
---
Changes in v2:
- Fixed whitespace
- Converted armed->active transition to inline function
- Added comment to document reason for skipping HFI1_CTRL_CTXT
  in set_all_slowpath()

Changes in v3:
- Changed from double if to single if statement in
  handle_receive_interrupt()

 drivers/staging/rdma/hfi1/chip.c   |  5 +--
 drivers/staging/rdma/hfi1/chip.h   |  2 ++
 drivers/staging/rdma/hfi1/driver.c | 72 ++
 drivers/staging/rdma/hfi1/hfi.h| 11 ++
 drivers/staging/rdma/hfi1/init.c   |  1 +
 5 files changed, 89 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/chip.c b/drivers/staging/rdma/hfi1/chip.c
index f7bf902..63d5d71 100644
--- a/drivers/staging/rdma/hfi1/chip.c
+++ b/drivers/staging/rdma/hfi1/chip.c
@@ -7878,7 +7878,7 @@ static inline void clear_recv_intr(struct hfi1_ctxtdata 
*rcd)
 }
 
 /* force the receive interrupt */
-static inline void force_recv_intr(struct hfi1_ctxtdata *rcd)
+void force_recv_intr(struct hfi1_ctxtdata *rcd)
 {
write_csr(rcd->dd, CCE_INT_FORCE + (8 * rcd->ireg), rcd->imask);
 }
@@ -7977,7 +7977,7 @@ u32 read_physical_state(struct hfi1_devdata *dd)
& DC_DC8051_STS_CUR_STATE_PORT_MASK;
 }
 
-static u32 read_logical_state(struct hfi1_devdata *dd)
+u32 read_logical_state(struct hfi1_devdata *dd)
 {
u64 reg;
 
@@ -9952,6 +9952,7 @@ int set_link_state(struct hfi1_pportdata *ppd, u32 state)
ppd->link_enabled = 1;
}
 
+   set_all_slowpath(ppd->dd);
ret = set_local_link_attributes(ppd);
if (ret)
break;
diff --git a/drivers/staging/rdma/hfi1/chip.h b/drivers/staging/rdma/hfi1/chip.h
index b46ef66..78ba425 100644
--- a/drivers/staging/rdma/hfi1/chip.h
+++ b/drivers/staging/rdma/hfi1/chip.h
@@ -690,6 +690,8 @@ u64 read_dev_cntr(struct hfi1_devdata *dd, int index, int 
vl);
 u64 write_dev_cntr(struct hfi1_devdata *dd, int index, int vl, u64 data);
 u64 read_port_cntr(struct hfi1_pportdata *ppd, int index, int vl);
 u64 write_port_cntr(struct hfi1_pportdata *ppd, int index, int vl, u64 data);
+u32 read_logical_state(struct hfi1_devdata *dd);
+void force_recv_intr(struct hfi1_ctxtdata *rcd);
 
 /* Per VL indexes */
 enum {
diff --git a/drivers/staging/rdma/hfi1/driver.c 
b/drivers/staging/rdma/hfi1/driver.c
index 3218520..d096f11 100644
--- a/drivers/staging/rdma/hfi1/driver.c
+++ b/drivers/staging/rdma/hfi1/driver.c
@@ -862,6 +862,37 @@ static inline void set_all_dma_rtail(struct hfi1_devdata 
*dd)
_receive_interrupt_dma_rtail;
 }
 
+void set_all_slowpath(struct hfi1_devdata *dd)
+{
+   int i;
+
+   /* HFI1_CTRL_CTXT must always use the slow path interrupt handler */
+   for (i = HFI1_CTRL_CTXT + 1; i < dd->first_user_ctxt; i++)
+   dd->rcd[i]->do_interrupt = _receive_interrupt;
+}
+
+static inline int set_armed_to_active(struct hfi1_ctxtdata *rcd,
+ struct hfi1_packet packet,
+ struct hfi1_devdata *dd)
+{
+   struct work_struct *lsaw = >ppd->linkstate_active_work;
+   struct hfi1_message_header *hdr = hfi1_get_msgheader(packet.rcd->dd,
+packet.rhf_addr);
+
+   if (hdr2sc(hdr, packet.rhf) != 0xf) {
+   int hwstate = read_logical_state(dd);
+
+   if (hwstate != LSTATE_ACTIVE) {
+   dd_dev_info(dd, "Unexpected link state %d\n", hwstate);
+   return 0;
+   }
+
+   queue_work(rcd->ppd->hfi1_wq, lsaw);
+   return 1;
+   }
+   return 0;
+}
+
 /*
  * handle_receive_interrupt - receive a packet
  * @rcd: the context
@@ -929,6 +960,11 @@ int handle_receive_interrupt(struct hfi1_ctxtdata *rcd, 
int thread)
last = skip_rcv_packet(, thread);
skip_pkt = 0;
} else {
+   /* Auto activate link on non-SC15 packet receive */
+   if (unlikely(rcd->ppd->host_link_state ==
+HLS_UP_ARMED) &&
+   set_armed_to_active(rcd, 

Re: [PATCHv3 TRIVIAL] IB/core: Documentation fix to ib_mad_snoop_handler in the MAD header file

2016-01-05 Thread Or Gerlitz
On Tue, Jan 5, 2016 at 9:00 PM, Hefty, Sean  wrote:
>> > There are no in tree users and the only attempt at adding one was
>> rejected.
>>
>> There are no in tree users of this but there is your madeye tool (which
>> is out of tree). This is still a useful debug tool for MADs and there
>> are people who still use that.
>
> It's an out of tree tool.  Maintain the snooping hooks into the mad layer out 
> of tree as well.

Any real reason not to pick this good tool into the kernel code?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 00/15] Various ib_srpt patches

2016-01-05 Thread Bart Van Assche
The following series of patches is what I came up with while testing the 
most recent version of my SCSI target patch series (see also 
http://thread.gmane.org/gmane.linux.scsi.target.devel/10905):


0001-IB-srpt-Add-parentheses-around-sizeof-argument.patch
0002-IB-srpt-Inline-srpt_sdev_name.patch
0003-IB-srpt-Inline-srpt_get_ch_state.patch
0004-IB-srpt-Introduce-target_reverse_dma_direction.patch
0005-IB-srpt-Use-scsilun_to_int.patch
0006-IB-srpt-Simplify-srpt_handle_tsk_mgmt.patch
0007-IB-srpt-Simplify-channel-state-management.patch
0008-IB-srpt-Simplify-srpt_shutdown_session.patch
0009-IB-srpt-Fix-srpt_close_session.patch
0010-IB-srpt-Fix-srpt_handle_cmd-error-paths.patch
0011-IB-srpt-Fix-how-aborted-commands-are-processed.patch
0012-IB-srpt-Eliminate-srpt_find_channel.patch
0013-IB-srpt-Detect-session-shutdown-reliably.patch
0014-IB-srpt-Fix-srpt_write_pending.patch
0015-IB-srpt-Fix-a-rare-crash-in-srpt_close_session.patch
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/15] IB/srpt: Introduce target_reverse_dma_direction()

2016-01-05 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/15] IB/srpt: Use scsilun_to_int()

2016-01-05 Thread Christoph Hellwig
On Tue, Jan 05, 2016 at 03:22:46PM +0100, Bart Van Assche wrote:
> Just like other target drivers, use scsilun_to_int() to unpack SCSI
> LUN numbers. This patch only changes the behavior of ib_srpt for LUN
> numbers >= 16384.
> 
> Signed-off-by: Bart Van Assche 
> Cc: Christoph Hellwig 

Looks good,

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 06/15] IB/srpt: Simplify srpt_handle_tsk_mgmt()

2016-01-05 Thread Christoph Hellwig
On Tue, Jan 05, 2016 at 03:23:14PM +0100, Bart Van Assche wrote:
> Let the target core check task existence instead of the SRP target
> driver.

Looks good,

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/15] IB/srpt: Add parentheses around sizeof argument

2016-01-05 Thread Christoph Hellwig
On Tue, Jan 05, 2016 at 03:20:50PM +0100, Bart Van Assche wrote:
> Although sizeof is an operator and hence in many cases parentheses can
> be left out, the recommended kernel coding style is to surround the
> sizeof argument with parentheses. This patch does not change any
> functionality. This patch has been generated by running the following
> shell command:

I don't really care about this formatting, but the patch looks fine:

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/15] IB/srpt: Inline srpt_get_ch_state()

2016-01-05 Thread Christoph Hellwig
On Tue, Jan 05, 2016 at 03:21:53PM +0100, Bart Van Assche wrote:
> The callers of srpt_get_ch_state() can access ch->state safely without
> using locking. Hence inline this function.

Looks good,

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 11/15] IB/srpt: Fix how aborted commands are processed

2016-01-05 Thread Christoph Hellwig
>   pr_debug("Aborting cmd with state %d and tag %lld\n", state,
>ioctx->cmd.tag);
>  
> @@ -1299,14 +1291,16 @@ static int srpt_abort_cmd(struct srpt_send_ioctx 
> *ioctx)
>   case SRPT_STATE_NEW:
>   case SRPT_STATE_DATA_IN:
>   case SRPT_STATE_MGMT:
> + case SRPT_STATE_DONE:
>   /*
>* Do nothing - defer abort processing until
>* srpt_queue_response() is invoked.
>*/
> - WARN_ON(!(ioctx->cmd.transport_state & CMD_T_ABORTED));

Seems like this depends on your target core changes?  Maybe it would be better
to respin the series to got just on top of Doug's RDMA tree, as I think
we're more likely to get this series merged for 4.5 than the target core
changes..

Otherwise these changes look fine to me.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/15] IB/srpt: Fix srpt_close_session()

2016-01-05 Thread Christoph Hellwig
On Tue, Jan 05, 2016 at 03:24:49PM +0100, Bart Van Assche wrote:
> Avoid that srpt_close_session() waits if it doesn't have to wait.
> Additionally, increase the time during which srpt_close_session()
> waits until closing a session has finished. This makes it easier
> to detect session shutdown bugs.

Looks good,

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/15] IB/srpt: Simplify srpt_shutdown_session()

2016-01-05 Thread Christoph Hellwig
On Tue, Jan 05, 2016 at 03:24:15PM +0100, Bart Van Assche wrote:
> The target core guarantees that shutdown_session() is only invoked
> once per session. This means that the ib_srpt target driver doesn't
> have to track whether or not shutdown_session() has been called.
> Additionally, ensure that target_sess_cmd_list_set_waiting() is
> called before target_wait_for_sess_cmds() by moving it into
> srpt_release_channel_work().

Looks good,

Reviewed-by: Christoph Hellwig 

Mote that now most drivers return either always return 0 or 1 from
shutdown_session, so it might be worth to investigate if we can get
rid of this method in the future.

Minor nitpick below:

>  static int srpt_shutdown_session(struct se_session *se_sess)
>  {
>   return true;
>  }


Given that the function returns in this really should be 1 in instead of
true, but it's not really worth respinning the patch just for this.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/15] IB/srpt: Fix srpt_handle_cmd() error paths

2016-01-05 Thread Christoph Hellwig
On Tue, Jan 05, 2016 at 03:25:13PM +0100, Bart Van Assche wrote:
> The target core function that should be called if target_submit_cmd()
> fails is target_put_sess_cmd(). Additionally, change the return type
> of srpt_handle_cmd() from int into void.

I actually ran into this bug a long time ago with a modified srpt driver
and forgot to send a similar fix..

Looks good:

Reviewed-by: Christoph Hellwig 

Minor nitpick below:

> + send_ioctx->state = SRPT_STATE_DONE;
> + target_put_sess_cmd(cmd);
> + return;
>  }

no need for that return statement.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/15] IB/srpt: Simplify channel state management

2016-01-05 Thread Christoph Hellwig
On Tue, Jan 05, 2016 at 03:23:45PM +0100, Bart Van Assche wrote:
> The only allowed channel state changes are those that change
> the channel state into a state with a higher numerical value.
> This allows to merge the functions srpt_set_ch_state() and
> srpt_test_and_set_ch_state() into a single function.

It would be great having a little comment in srpt_set_ch_state explaining
why only changing to the numerical greater state is fine.

Otherwise looks good:

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 12/15] IB/srpt: Eliminate srpt_find_channel()

2016-01-05 Thread Christoph Hellwig
On Tue, Jan 05, 2016 at 03:26:22PM +0100, Bart Van Assche wrote:
> In the CM REQ message handler, store the channel pointer in
> cm_id->context such that the function srpt_find_channel() is no
> longer needed. Additionally, make the CM event messages more
> informative.

Looks fine,

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/11] IB: only keep a single key in struct ib_mr

2016-01-05 Thread Christoph Hellwig
On Tue, Jan 05, 2016 at 10:46:36AM -0700, Jason Gunthorpe wrote:
> > ULPs are *already* using the same registrations for both local and
> > remote access.
> 
> Where? Out of tree?

I haven't found anything in-tree for sure.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2 v2] IB/mad: use CQ abstraction

2016-01-05 Thread Christoph Hellwig
Remove the local workqueue to process mad completions and use the CQ API
instead.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Hal Rosenstock 
Reviewed-by: Ira Weiny 
---
 drivers/infiniband/core/mad.c  | 162 +
 drivers/infiniband/core/mad_priv.h |   2 +-
 2 files changed, 59 insertions(+), 105 deletions(-)

diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index cbe232a..9fa5bf3 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -61,18 +61,6 @@ MODULE_PARM_DESC(send_queue_size, "Size of send queue in 
number of work requests
 module_param_named(recv_queue_size, mad_recvq_size, int, 0444);
 MODULE_PARM_DESC(recv_queue_size, "Size of receive queue in number of work 
requests");
 
-/*
- * Define a limit on the number of completions which will be processed by the
- * worker thread in a single work item.  This ensures that other work items
- * (potentially from other users) are processed fairly.
- *
- * The number of completions was derived from the default queue sizes above.
- * We use a value which is double the larger of the 2 queues (receive @ 512)
- * but keep it fixed such that an increase in that value does not introduce
- * unfairness.
- */
-#define MAD_COMPLETION_PROC_LIMIT 1024
-
 static struct list_head ib_mad_port_list;
 static u32 ib_mad_client_id = 0;
 
@@ -96,6 +84,9 @@ static int add_nonoui_reg_req(struct ib_mad_reg_req 
*mad_reg_req,
  u8 mgmt_class);
 static int add_oui_reg_req(struct ib_mad_reg_req *mad_reg_req,
   struct ib_mad_agent_private *agent_priv);
+static bool ib_mad_send_error(struct ib_mad_port_private *port_priv,
+ struct ib_wc *wc);
+static void ib_mad_send_done(struct ib_cq *cq, struct ib_wc *wc);
 
 /*
  * Returns a ib_mad_port_private structure or NULL for a device/port
@@ -701,12 +692,11 @@ static void snoop_recv(struct ib_mad_qp_info *qp_info,
spin_unlock_irqrestore(_info->snoop_lock, flags);
 }
 
-static void build_smp_wc(struct ib_qp *qp,
-u64 wr_id, u16 slid, u16 pkey_index, u8 port_num,
-struct ib_wc *wc)
+static void build_smp_wc(struct ib_qp *qp, struct ib_cqe *cqe, u16 slid,
+   u16 pkey_index, u8 port_num, struct ib_wc *wc)
 {
memset(wc, 0, sizeof *wc);
-   wc->wr_id = wr_id;
+   wc->wr_cqe = cqe;
wc->status = IB_WC_SUCCESS;
wc->opcode = IB_WC_RECV;
wc->pkey_index = pkey_index;
@@ -844,7 +834,7 @@ static int handle_outgoing_dr_smp(struct 
ib_mad_agent_private *mad_agent_priv,
}
 
build_smp_wc(mad_agent_priv->agent.qp,
-send_wr->wr.wr_id, drslid,
+send_wr->wr.wr_cqe, drslid,
 send_wr->pkey_index,
 send_wr->port_num, _wc);
 
@@ -1051,7 +1041,9 @@ struct ib_mad_send_buf * ib_create_send_mad(struct 
ib_mad_agent *mad_agent,
 
mad_send_wr->sg_list[1].lkey = mad_agent->qp->pd->local_dma_lkey;
 
-   mad_send_wr->send_wr.wr.wr_id = (unsigned long) mad_send_wr;
+   mad_send_wr->mad_list.cqe.done = ib_mad_send_done;
+
+   mad_send_wr->send_wr.wr.wr_cqe = _send_wr->mad_list.cqe;
mad_send_wr->send_wr.wr.sg_list = mad_send_wr->sg_list;
mad_send_wr->send_wr.wr.num_sge = 2;
mad_send_wr->send_wr.wr.opcode = IB_WR_SEND;
@@ -1163,8 +1155,9 @@ int ib_send_mad(struct ib_mad_send_wr_private 
*mad_send_wr)
 
/* Set WR ID to find mad_send_wr upon completion */
qp_info = mad_send_wr->mad_agent_priv->qp_info;
-   mad_send_wr->send_wr.wr.wr_id = (unsigned long)_send_wr->mad_list;
mad_send_wr->mad_list.mad_queue = _info->send_queue;
+   mad_send_wr->mad_list.cqe.done = ib_mad_send_done;
+   mad_send_wr->send_wr.wr.wr_cqe = _send_wr->mad_list.cqe;
 
mad_agent = mad_send_wr->send_buf.mad_agent;
sge = mad_send_wr->sg_list;
@@ -2185,13 +2178,14 @@ handle_smi(struct ib_mad_port_private *port_priv,
return handle_ib_smi(port_priv, qp_info, wc, port_num, recv, response);
 }
 
-static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv,
-struct ib_wc *wc)
+static void ib_mad_recv_done(struct ib_cq *cq, struct ib_wc *wc)
 {
+   struct ib_mad_port_private *port_priv = cq->cq_context;
+   struct ib_mad_list_head *mad_list =
+   container_of(wc->wr_cqe, struct ib_mad_list_head, cqe);
struct ib_mad_qp_info *qp_info;
struct ib_mad_private_header *mad_priv_hdr;
struct ib_mad_private *recv, *response = NULL;
-   struct ib_mad_list_head *mad_list;
struct ib_mad_agent_private *mad_agent;
int port_num;
int ret = IB_MAD_RESULT_SUCCESS;
@@ -2199,7 +2193,17 @@ static void ib_mad_recv_done_handler(struct 
ib_mad_port_private *port_priv,
u16 

Unable to establish rdma connection, breaks rdma basic functionality

2016-01-05 Thread Hariprasad S

Hi Doug,

I am trying to rping server, but it fails when bound to any address other then 
IF_ANY.
# rping -s -a 102.1.1.129 -C1 -p  -vd
created cm_id 0x23d7800
rdma_bind_addr: No such file or directory
destroy cm_id 0x23d7800

If bound to IF_ANY address, server starts but client fails to establish 
connection.
# rping -s -C1 -p  -vvvd
created cm_id 0xc34800
rdma_bind_addr successful
rdma_listen

And the commit which introduced this regression is

commit abae1b71dd37bab506b14a6cf6ba7148f4d57232
Author: Matan Barak 
Date:   Thu Oct 15 18:38:49 2015 +0300

IB/cma: cma_validate_port should verify the port and netdevice

Previously, cma_validate_port searched for GIDs in IB cache and then
tried to verify the found port. This could fail when there are
identical GIDs on both ports. In addition, netdevice should be taken
into account when searching the GID table.
Fixing cma_validate_port to search only the relevant port's cache
and netdevice.

Signed-off-by: Matan Barak 
Signed-off-by: Doug Ledford 

Re: [PATCH 2/2] IB/mad: use CQ abstraction

2016-01-05 Thread Christoph Hellwig
On Mon, Jan 04, 2016 at 07:04:03PM -0500, ira.weiny wrote:
> Sorry I did not catch this before but rather than void * wouldn't it be better
> to use struct ib_cqe?

Sure, I'll fix it up.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/15] IB/srpt: Fix srpt_write_pending()

2016-01-05 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 15/15] IB/srpt: Fix a rare crash in srpt_close_session()

2016-01-05 Thread Christoph Hellwig
>   srpt_disconnect_ch(ch);
>  
> + kref_put(>kref, srpt_free_ch);

At some point it might be a good idea to have a srpt_put_ch helper to wrap
this pattern.

Otherwise looks good:

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/15] IB/srpt: Detect session shutdown reliably

2016-01-05 Thread Christoph Hellwig
On Tue, Jan 05, 2016 at 03:26:49PM +0100, Bart Van Assche wrote:
> The Last WQE Reached event is only generated after one or more work
> requests have been queued on the QP associated with a session. Since
> session shutdown can start before any work requests have been queued,
> use a zero-length RDMA write to wait until a QP has been drained.

We actually ran into the same issue with a SRPT-derived work in progress
driver recently..

> @@ -2314,14 +2346,13 @@ static void srpt_cm_timewait_exit(struct srpt_rdma_ch 
> *ch)
>  {
>   pr_info("Received CM TimeWait exit for ch %s-%d.\n", ch->sess_name,
>   ch->qp->qp_num);
> + srpt_close_ch(ch);
>  }
>  
>  static void srpt_cm_rep_error(struct srpt_rdma_ch *ch)
>  {
>   pr_info("Received CM REP error for ch %s-%d.\n", ch->sess_name,
>   ch->qp->qp_num);
>  }
>  
>  /**
> @@ -2329,33 +2360,7 @@ static void srpt_cm_rep_error(struct srpt_rdma_ch *ch)
>   */
>  static void srpt_cm_dreq_recv(struct srpt_rdma_ch *ch)
>  {
> + srpt_disconnect_ch(ch);
>  }
>  
>  /**
> @@ -2364,7 +2369,7 @@ static void srpt_cm_dreq_recv(struct srpt_rdma_ch *ch)
>  static void srpt_cm_drep_recv(struct srpt_rdma_ch *ch)
>  {
>   pr_info("Received InfiniBand DREP message for cm_id %p.\n", ch->cm_id);
> + srpt_close_ch(ch);
>  }


Is there any good reson to keep these one-liner helpers around?

Otherwise looks good,

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 00/15] staging/rdma/hfi1: Initial patches to add rdmavt support in HFI1

2016-01-05 Thread ira.weiny
> > > > > 
> > > > > If Doug accepts the library changes, let me know that public git 
> > > > > commit
> > > > > and I can pull it into the staging-next branch and you can continue to
> > > > > send me staging patches that way.
> > > > 
> > > > Won't this cause a conflict during the merge window?
> > > 
> > > No, git is good :)
> > > 
> > > > How do we handle changes which affect both qib and hfi1?
> > > 
> > > I don't know, now this gets messy...
> > > 
> > 
> > Agreed and this is what we are worried about.
> > 
> > Can we do what Dan and Doug have proposed in the past and have Doug take 
> > over
> > the staging/rdma sub-tree?
> > 
> > http://driverdev.linuxdriverproject.org/pipermail/driverdev-devel/2015-November/081922.html
> > 
> > I think the upcoming merge window is a reasonable time for him to do that.
> 
> Ok, but keeping on top of all of the generic staging patches that come
> in is a tough thing to do, that's up to Doug, if he is ready for it...
> 

Greg,

Forgive me for not knowing how multiple maintainers deal with hand offs like
this.  I'm hoping you, and/or Doug, can answer some questions for me.

Am I correct in assuming the merge window will be open on Monday?  If so, when
will Linus pull the staging tree?  Then at what point will Doug get the hfi1
changes which have already been accepted?

Are you going to be able to review the outstanding patches for
staging/rdma/hfi1 before the merge window?  Or should we consider them dropped
and resubmit to Doug to apply after he has merged the latest hfi1 code from
Linus?


Doug,

At what point should we start submitting additional patches to you which we
have queued up but are not yet submitted to Greg?

So far we have been cross posting to linux-rdma for feedback and I see that the
patches have been dropped from patchworks.  I assume you dropped them from
patchworks because you knew that Greg was handling them?

Thanks,
Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH rdma-RC] IB/cm: Fix sleeping while atomic when creating AH from WC

2016-01-05 Thread Matan Barak
On Thu, Dec 24, 2015 at 9:46 AM, Matan Barak  wrote:
> On Wed, Dec 23, 2015 at 10:04 PM, Doug Ledford  wrote:
>> On 10/15/2015 12:58 PM, Hefty, Sean wrote:
>> ib_create_ah_from_wc needs to resolve the DMAC in order to create the
>> AH (this may result sending an ARP and waiting for response).
>> CM uses this function (which is now sleepable).
>
> This is a significant change to the CM.  The CM calls are invoked
 assuming that they return relatively quickly.  They're invoked from
 callbacks and internally.  Having the calls now wait for an ARP response
 requires that this be re-architected, so the calling thread doesn't go out
 to lunch for several seconds.

 Agree - this is a significant change, but it was done a long time ago
 (at v4.3 if I recall). When we need to send a message we need to
>>>
>>> We're at 4.3-rc5?
>>>
 figure out the destination MAC. Even the passive side needs to do that
 as some vendors don't report the source MAC of the packet in their wc.
 Even if they did, since IP based addressing is rout-able by its
 nature, it should follow the networking stack rules. Some crazy
 configurations could force sending responses to packets that came from
 router1 to router2 - so we have no choice than resolving the DMAC at
 every side.
>>>
>>> Ib_create_ah_from_wc is broken.   It is now an asynchronous operation, only 
>>> the call itself was left as synchronous.  We can't block kernel threads for 
>>> a minute, or however long ARP takes to resolve.  The call itself must 
>>> change to be async, and all users of it updated to allocate some request, 
>>> queue it, and handle all race conditions that result -- such as state 
>>> changes or destruction of the work that caused the request to be initiated.
>>>
>>
>> I don't know who had intended to address this, but it got left out of
>> the 4.4 work.  We need to not let this drop through the cracks (for
>> another release).  Can someone please put fixing this properly on their
>> TODO list?
>>
>
> IMHO, the proposed patch makes things better. Not applying the current
> patch means we have a "sleeping while atomic" error (in addition to
> the fact that kernel threads could wait until the ARP process
> finishes), which is pretty bad. I tend to agree that adding another CM
> state is probably a better approach, but unless someone steps up and
> add this for v4.5, I think that's the best thing we have.
>
>> --
>> Doug Ledford 
>>   GPG KeyID: 0E572FDD
>>
>>
>
> Matan

Yishai has found a double free bug in the error flow of this patch.
The fix is pretty simple.
Thanks Yishai for catching and testing this fix.

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 07a3bbf..832674f 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -296,10 +296,9 @@ static int _cm_alloc_response_msg(struct cm_port *port,
   0, IB_MGMT_MAD_HDR, IB_MGMT_MAD_DATA,
   GFP_ATOMIC,
   IB_MGMT_BASE_VERSION);
-   if (IS_ERR(m)) {
-   ib_destroy_ah(ah);
+   if (IS_ERR(m))
return PTR_ERR(m);
-   }
+
m->ah = ah;
*msg = m;
return 0;
@@ -310,13 +309,18 @@ static int cm_alloc_response_msg(struct cm_port *port,
 struct ib_mad_send_buf **msg)
 {
struct ib_ah *ah;
+   int ret;

ah = ib_create_ah_from_wc(port->mad_agent->qp->pd, mad_recv_wc->wc,
  mad_recv_wc->recv_buf.grh, port->port_num);
if (IS_ERR(ah))
return PTR_ERR(ah);

-   return _cm_alloc_response_msg(port, mad_recv_wc, ah, msg);
+   ret = _cm_alloc_response_msg(port, mad_recv_wc, ah, msg);
+   if (ret)
+   ib_destroy_ah(ah);
+
+   return ret;
 }

 static void cm_free_msg(struct ib_mad_send_buf *msg)


Doug, if you intend to take this patch. I can squash this fix and respin it.

Thanks,
Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH][v4.2.y-ckt] net/mlx5e: Max mtu comparison fix

2016-01-05 Thread Joseph Salisbury
Hello,

Please consider including mainline commit
50a9eea694ab8e0779069e0a4e0b12e145521468 in the next v4.2.y-ckt
release.  It was
included in the mainline tree as of v4.4-rc2.  It has been tested and
confirmed to resolve:
http://bugs.launchpad.net/bugs/1528466

commit 50a9eea694ab8e0779069e0a4e0b12e145521468
Author: Doron Tsur 
Date:   Thu Nov 12 19:35:27 2015 +0200

net/mlx5e: Max mtu comparison fix


Sincerely,

Joseph Salisbury


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv1 2/6] IB/core: Added members to support rdma cgroup

2016-01-05 Thread Tejun Heo
On Wed, Jan 06, 2016 at 12:28:02AM +0530, Parav Pandit wrote:
> Added function pointer table to store resource pool specific
> operation for each resource type (verb and hw).
> Added list node to link device to rdma cgroup so that it can
> participate in resource accounting and limit configuration.

Is there any point in splitting patches 1 and 2 from 3?

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv4 TRIVIAL] IB/core: Documentation fix to the snoop handler in the MAD header file

2016-01-05 Thread ira.weiny
On Tue, Jan 05, 2016 at 01:52:55PM -0500, Hal Rosenstock wrote:
> In ib_mad.h, ib_mad_snoop_handler uses send_buf rather than send_wr
> 
> Signed-off-by: Hal Rosenstock 

First off I have to say; this comment is wrong and should be fixed.

Reviewed-by: Ira Weiny 

That said.

I agree with Sean that the snoop interface should be removed.

I don't know the specific reason madeye was not accepted upstream back in the
day.  However, I _think_ it was because it implements a non-standard tracing
mechanism.

For this reason I have explored the use of the standard tracing infrastructure
within the mad stack.  The series I sent for general comment [1] implemented
tracing at a number of levels (umad and mad) as well as tracing of registered
agents.  This IMO give more insight into what is going on within the MAD stack
than the madeye module.

I'll see what I can do to update the tracing code.  In the mean time if others
want to look at the tracing code I have so far I pushed a branch to my github.

https://github.com/weiny2/linux-kernel  doug-fn-mad-trace

Ira

[1] https://www.mail-archive.com/linux-rdma%40vger.kernel.org/msg28188.html

> ---
> Change since v3:
> Fixed title to not include function name
> 
> Change since v2:
> Changed title to use "higher" language
> 
> Change since v1:
> Fixed typo in patch description
> 
> diff --git a/include/rdma/ib_mad.h b/include/rdma/ib_mad.h
> index ec9b44d..2b3573d 100644
> --- a/include/rdma/ib_mad.h
> +++ b/include/rdma/ib_mad.h
> @@ -424,11 +424,11 @@ typedef void (*ib_mad_send_handler)(struct ib_mad_agent 
> *mad_agent,
>  /**
>   * ib_mad_snoop_handler - Callback handler for snooping sent MADs.
>   * @mad_agent: MAD agent that snooped the MAD.
> - * @send_wr: Work request information on the sent MAD.
> + * @send_buf: send MAD data buffer.
>   * @mad_send_wc: Work completion information on the sent MAD.  Valid
>   *   only for snooping that occurs on a send completion.
>   *
> - * Clients snooping MADs should not modify data referenced by the @send_wr
> + * Clients snooping MADs should not modify data referenced by the @send_buf
>   * or @mad_send_wc.
>   */
>  typedef void (*ib_mad_snoop_handler)(struct ib_mad_agent *mad_agent,
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][v4.2.y-ckt] net/mlx5e: Max mtu comparison fix

2016-01-05 Thread Kamal Mostafa
On Tue, 2016-01-05 at 16:01 -0500, Joseph Salisbury wrote:
> Hello,
> 
> Please consider including mainline commit
> 50a9eea694ab8e0779069e0a4e0b12e145521468 in the next v4.2.y-ckt
> release.  It was
> included in the mainline tree as of v4.4-rc2.  It has been tested and
> confirmed to resolve:
> http://bugs.launchpad.net/bugs/1528466


Yes, I'll queue this up for 4.2.8-ckt1.  Thanks, Joseph!

 -Kamal

> commit 50a9eea694ab8e0779069e0a4e0b12e145521468
> Author: Doron Tsur 
> Date:   Thu Nov 12 19:35:27 2015 +0200
> 
> net/mlx5e: Max mtu comparison fix
> 
> 
> Sincerely,
> 
> Joseph Salisbury
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv1 0/6] rdma controller support

2016-01-05 Thread Tejun Heo
Hello,

On Wed, Jan 06, 2016 at 12:28:00AM +0530, Parav Pandit wrote:
> Resources are not defined by the RDMA cgroup. Resources are defined
> by RDMA/IB stack & optionally by HCA vendor device drivers.

As I wrote before, I don't think this is a good idea.  Drivers will
inevitably add non-sensical "resources" which don't make any sense
without much scrutiny.  If different controllers can't agree upon the
same set of resources, which probably is a pretty good sign that this
isn't too well thought out to begin with, at least make all resource
types defined by the controller itself and let the controllers enable
them selectively.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv1 6/6] rdmacg: Added documentation for rdma controller.

2016-01-05 Thread Tejun Heo
Hello,

On Wed, Jan 06, 2016 at 12:28:06AM +0530, Parav Pandit wrote:
> +5-4-1. RDMA Interface Files
> +
> +  rdma.resource.verb.list
> +  rdma.resource.verb.limit
> +  rdma.resource.verb.usage
> +  rdma.resource.verb.failcnt
> +  rdma.resource.hw.list
> +  rdma.resource.hw.limit
> +  rdma.resource.hw.usage
> +  rdma.resource.hw.failcnt

Can you please read the rest of cgroup.txt and put the interface in
line with the common conventions followed by other controllers?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv1 3/6] rdmacg: implements rdma cgroup

2016-01-05 Thread Tejun Heo
Hello,

On Wed, Jan 06, 2016 at 12:28:03AM +0530, Parav Pandit wrote:
> +/* hash table to keep map of tgid to owner cgroup */
> +DEFINE_HASHTABLE(pid_cg_map_tbl, 7);
> +DEFINE_SPINLOCK(pid_cg_map_lock);/* lock to protect hash table access */
> +
> +/* Keeps mapping of pid to its owning cgroup at rdma level,
> + * This mapping doesn't change, even if process migrates from one to other
> + * rdma cgroup.
> + */
> +struct pid_cg_map {
> + struct pid *pid;/* hash key */
> + struct rdma_cgroup *cg;
> +
> + struct hlist_node hlist;/* pid to cgroup hash table link */
> + atomic_t refcnt;/* count active user tasks to figure out
> +  * when to free the memory
> +  */
> +};

Ugh, there's something clearly wrong here.  Why does the rdma
controller need to keep track of pid cgroup membership?

> +static void _dealloc_cg_rpool(struct rdma_cgroup *cg,
> +   struct cg_resource_pool *rpool)
> +{
> + spin_lock(>cg_list_lock);
> +
> + /* if its started getting used by other task,
> +  * before we take the spin lock, then skip,
> +  * freeing it.
> +  */

Please follow CodingStyle.

> + if (atomic_read(>refcnt) == 0) {
> + list_del_init(>cg_list);
> + spin_unlock(>cg_list_lock);
> +
> + _free_cg_rpool(rpool);
> + return;
> + }
> + spin_unlock(>cg_list_lock);
> +}
> +
> +static void dealloc_cg_rpool(struct rdma_cgroup *cg,
> +  struct cg_resource_pool *rpool)
> +{
> + /* Don't free the resource pool which is created by the
> +  * user, otherwise we miss the configured limits. We don't
> +  * gain much either by splitting storage of limit and usage.
> +  * So keep it around until user deletes the limits.
> +  */
> + if (atomic_read(>creator) == RDMACG_RPOOL_CREATOR_DEFAULT)
> + _dealloc_cg_rpool(cg, rpool);

I'm pretty sure you can get away with an fixed length array of
counters.  Please keep it simple.  It's a simple hard limit enforcer.
There's no need to create a massive dynamic infrastrucure.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html