Re: [PATCH-v2] ib_srpt: Initial SRP Target merge for v3.2-rc1

2011-10-25 Thread Bart Van Assche
On Mon, Oct 24, 2011 at 8:33 PM, Nicholas A. Bellinger
n...@linux-iscsi.org wrote:
 It would have been nice if you bothered to mention these in any of the
 pre merge window reviews, but given your delay responsed on these items
 they will have to be something we'll fix post merge.

Sorry for that - it took some time though before I remembered myself
about all this.

If you are looking for a test case: pulling an IB cable during I/O a
few times should allow to trigger this.

Bart.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH-v2] ib_srpt: Initial SRP Target merge for v3.2-rc1

2011-10-25 Thread Bart Van Assche
On Mon, Oct 24, 2011 at 11:01 PM, Roland Dreier rol...@purestorage.com wrote:
 On Mon, Oct 24, 2011 at 11:33 AM, Nicholas A. Bellinger
 n...@linux-iscsi.org wrote:
 - Handle IB completion timeouts. Although the InfiniBand Architecture
 Manual specifies that a HCA must generate an error completion for each
 pending work item when a queue pair reset is issued, an important
 class of HCAs doesn't do this (that includes the HCA in your test
 setup). In the SCST version of this driver such timeouts are handled
 by the function srpt_pending_cmd_timeout().

 Hold on... not sure what you mean by a queue pair reset but the IB
 spec does not say that a flush error should be generated when a QP is
 transitioned to the reset state.  The flush error completions are
 required when a QP is transitioned to the error state, and in fact
 completion entries may be lost when a QP transitions to reset.

I was referring to transitioning a QP to the error state -- see also
srpt_ch_qp_err().

 As far as I know every HCA supported by Linux does implement this
 correctly.  Which class did you have in mind as not doing that?

At least QDR ConnectX 2 HCAs with fairly recent firmware. This
behavior can be reproduced easily with the SCST version of ib_srpt as
follows:
- Log in from an SRP initiator to ib_srpt.
- Start a direct I/O read test, e.g. with fio.
- Issue the command rmmod ib_srpt on the target during I/O.

Bart.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/9] ib_srpt: Convert srp_max_rdma_size into per port configfs attribute

2011-10-25 Thread Nicholas A. Bellinger
On Mon, 2011-10-24 at 13:29 -0700, Nicholas A. Bellinger wrote:
 On Mon, 2011-10-24 at 05:33 +, Nicholas A. Bellinger wrote:
  From: Nicholas Bellinger n...@linux-iscsi.org
  
  This patch converts the srp_max_rdma_size module parameter into a per
  endpoint configfs attribute.  This includes adding the necessary bits
  for show + store attributes w/ min/max bounds checking, and updating
  srpt_get_ioc() to accept a struct srpt_port parameter.
  
  Reported-by: Roland Dreier rol...@purestorage.com
  Cc: Roland Dreier rol...@purestorage.com
  Cc: Bart Van Assche bvanass...@acm.org
  Signed-off-by: Nicholas A. Bellinger n...@linux-iscsi.org
  ---
   drivers/infiniband/ulp/srpt/ib_srpt.c |   60 
  -
   drivers/infiniband/ulp/srpt/ib_srpt.h |   10 +
   2 files changed, 61 insertions(+), 9 deletions(-)
  
 
 SNIP
 
  diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.h 
  b/drivers/infiniband/ulp/srpt/ib_srpt.h
  index 59ee2d7..c9caa90 100644
  --- a/drivers/infiniband/ulp/srpt/ib_srpt.h
  +++ b/drivers/infiniband/ulp/srpt/ib_srpt.h
  @@ -114,6 +114,7 @@ enum {
  MIN_SRPT_SRQ_SIZE = 4,
  DEFAULT_SRPT_SRQ_SIZE = 4095,
  MAX_SRPT_SRQ_SIZE = 65535,
  +   MAX_SRPT_RDMA_SIZE = 256U,
   
 
 Hey btw, what should the proper MAX_SRPT_RDMA_SIZE be here..?
 

Ping on this Bart..?

Thanks,

--nab

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] rdma/cma: fixed resource leak in case of an error in udaddy

2011-10-25 Thread Dotan Barak
Fixed a resource leak in case of an error level in the udaddy example.

Signed-off-by: Dotan Barak dot...@dev.mellanox.co.il
---
 examples/udaddy.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/examples/udaddy.c b/examples/udaddy.c
index 637306a..1534df5 100644
--- a/examples/udaddy.c
+++ b/examples/udaddy.c
@@ -547,7 +547,7 @@ static int run_server(void)
ret = rdma_bind_addr(listen_id, test.src_addr);
if (ret) {
perror(udaddy: bind address failed);
-   return ret;
+   goto out;
}
 
ret = rdma_listen(listen_id, 0);
-- 
1.7.4.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/9] ib_srpt: Convert srp_max_rdma_size into per port configfs attribute

2011-10-25 Thread Bart Van Assche
On Mon, Oct 24, 2011 at 10:29 PM, Nicholas A. Bellinger
n...@linux-iscsi.org wrote:
 @@ -114,6 +114,7 @@ enum {
       MIN_SRPT_SRQ_SIZE = 4,
       DEFAULT_SRPT_SRQ_SIZE = 4095,
       MAX_SRPT_SRQ_SIZE = 65535,
 +     MAX_SRPT_RDMA_SIZE = 256U,


 Hey btw, what should the proper MAX_SRPT_RDMA_SIZE be here..?

As you can see in the ib_srpt source code, the largest value filled in
for this parameter in the IOC profile is 1U  24.

Bart.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/9] ib_srpt: Convert srp_max_rdma_size into per port configfs attribute

2011-10-25 Thread Nicholas A. Bellinger
On Tue, 2011-10-25 at 12:32 +0200, Bart Van Assche wrote:
 On Mon, Oct 24, 2011 at 10:29 PM, Nicholas A. Bellinger
 n...@linux-iscsi.org wrote:
  @@ -114,6 +114,7 @@ enum {
MIN_SRPT_SRQ_SIZE = 4,
DEFAULT_SRPT_SRQ_SIZE = 4095,
MAX_SRPT_SRQ_SIZE = 65535,
  + MAX_SRPT_RDMA_SIZE = 256U,
 
 
  Hey btw, what should the proper MAX_SRPT_RDMA_SIZE be here..?
 
 As you can see in the ib_srpt source code, the largest value filled in
 for this parameter in the IOC profile is 1U  24.

Thanks Bart.  Fixing this up now.

--nab


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] RDMA/cxgb4: Mark the QP in error before disabling the Q in the firmware

2011-10-25 Thread Kumar Sanghvi
From: Tom Tucker t...@ogc.us

The QP needs to be moved to error before telling the firwmare
to shutdown the queue. Otherwise, the application can submit
WR that will never get fetched by the hardware and never
flushed by the driver.

Signed-off-by: Kumar Sanghvi kuma...@chelsio.com
---
 drivers/infiniband/hw/cxgb4/qp.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/qp.c b/drivers/infiniband/hw/cxgb4/qp.c
index a391a4a..d6ccc7e 100644
--- a/drivers/infiniband/hw/cxgb4/qp.c
+++ b/drivers/infiniband/hw/cxgb4/qp.c
@@ -1226,6 +1226,8 @@ int c4iw_modify_qp(struct c4iw_dev *rhp, struct c4iw_qp 
*qhp,
disconnect = 1;
c4iw_get_ep(qhp-ep-com);
}
+   if (qhp-ibqp.uobject)
+   t4_set_wq_in_error(qhp-wq);
ret = rdma_fini(rhp, qhp, ep);
if (ret)
goto err;
@@ -1244,6 +1246,8 @@ int c4iw_modify_qp(struct c4iw_dev *rhp, struct c4iw_qp 
*qhp,
break;
case C4IW_QP_STATE_ERROR:
set_state(qhp, C4IW_QP_STATE_ERROR);
+   if (qhp-ibqp.uobject)
+   t4_set_wq_in_error(qhp-wq);
if (!internal) {
abort = 1;
disconnect = 1;
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


mlx4 missing completions (was Re: [PATCH-v2] ib_srpt: Initial SRP Target merge for v3.2-rc1)

2011-10-25 Thread Roland Dreier
On Mon, Oct 24, 2011 at 11:07 PM, Bart Van Assche bvanass...@acm.org wrote:
 As far as I know every HCA supported by Linux does implement this
 correctly.  Which class did you have in mind as not doing that?

 At least QDR ConnectX 2 HCAs with fairly recent firmware. This
 behavior can be reproduced easily with the SCST version of ib_srpt as
 follows:
 - Log in from an SRP initiator to ib_srpt.
 - Start a direct I/O read test, e.g. with fio.
 - Issue the command rmmod ib_srpt on the target during I/O.

OK, this is a pretty serious bug in mlx4_ib if true.  Are
you sure that you really are seeing some pending work
requests not generating a flush error when the QP transitions
to the error state?

Keep in mind that transitioning the QP to the reset state
can cause completion entries to disappear after they
are generated.

 - R.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/10] RDMA/cxgb4: DB Drop Recovery for RDMA and LLD queues.

2011-10-25 Thread Roland Dreier
On Mon, Oct 24, 2011 at 2:12 PM, David Miller da...@davemloft.net wrote:
 1. We would like to recommend that all the patches get included in
 Roland's infiniband tree since it has build dependencies.

 This is fine with me.  It just means that Roland's tree has to have
 net-next included in it already, because otherwise the paths to
 the cxgb4 driver under drivers/net won't be right.

OK, let's do that...

 - R.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] IB/qib: Fix kernel panic on QME7342 boards

2011-10-25 Thread Mike Marciniszyn
From: Mitko Haralanov mi...@qlogic.com

Fix a kernel panic on QME7342 (and any Mezz boards without
QSFP connectors) caused by scheduling work on uninitialized
workqueue introduced by 62066fc8df4632c772d723813ce7af456d62ddf7.

Signed-off-by: Mitko Haralanov mi...@qlogic.com
Signed-off-by: Mike Marciniszyn mike.marcinis...@qlogic.com
---
 drivers/infiniband/hw/qib/qib_iba7322.c |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib_iba7322.c 
b/drivers/infiniband/hw/qib/qib_iba7322.c
index 62005c7..3e40cff 100644
--- a/drivers/infiniband/hw/qib/qib_iba7322.c
+++ b/drivers/infiniband/hw/qib/qib_iba7322.c
@@ -5285,8 +5285,10 @@ static int qib_7322_ib_updown(struct qib_pportdata *ppd, 
int ibup, u64 ibcs)
qib_7322_mini_pcs_reset(ppd);
/* schedule the qsfp refresh which should turn the link
   off */
-   qd-t_insert = get_jiffies_64();
-   schedule_work(qd-work);
+   if (ppd-dd-flags  QIB_HAS_QSFP) {
+   qd-t_insert = get_jiffies_64();
+   schedule_work(qd-work);
+   }
spin_lock_irqsave(ppd-sdma_lock, flags);
if (__qib_sdma_running(ppd))
__qib_sdma_process_event(ppd,


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mlx4 missing completions (was Re: [PATCH-v2] ib_srpt: Initial SRP Target merge for v3.2-rc1)

2011-10-25 Thread Bart Van Assche
On Tue, Oct 25, 2011 at 1:17 PM, Roland Dreier rol...@purestorage.com wrote:
 On Mon, Oct 24, 2011 at 11:07 PM, Bart Van Assche bvanass...@acm.org wrote:
  As far as I know every HCA supported by Linux does implement this
  correctly.  Which class did you have in mind as not doing that?
 
  At least QDR ConnectX 2 HCAs with fairly recent firmware. This
  behavior can be reproduced easily with the SCST version of ib_srpt as
  follows:
  1. Log in from an SRP initiator to ib_srpt.
  2. Start a direct I/O read test, e.g. with fio.
  3. Issue the command rmmod ib_srpt on the target during I/O.

 OK, this is a pretty serious bug in mlx4_ib if true.  Are
 you sure that you really are seeing some pending work
 requests not generating a flush error when the QP transitions
 to the error state?

It's a little more complex than that. The original version of ib_srpt
stops polling for completions as soon as the last WQE event has been
received and after that the queue has been drained. So I don't know
whether these flush errors were not delivered or whether these were
delivered too late.

Bart.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mlx4 missing completions (was Re: [PATCH-v2] ib_srpt: Initial SRP Target merge for v3.2-rc1)

2011-10-25 Thread Roland Dreier
On Tue, Oct 25, 2011 at 11:09 AM, Bart Van Assche bvanass...@acm.org wrote:
 It's a little more complex than that. The original version of ib_srpt
 stops polling for completions as soon as the last WQE event has been
 received and after that the queue has been drained. So I don't know
 whether these flush errors were not delivered or whether these were
 delivered too late.

Sorry, but now I confused about what the bug is.  You have a QP
associated with an SRQ, and you transition the QP to error.  At some
point you get a last WQE received event for that QP (which means
all receive requests for that QP have completed), and you drain the
CQ after that, which means you are guaranteed to have processed
all the receive requests for that QP.  (At least this is how the verbs
are supposed to work).

So what goes wrong with ConnectX?

I don't think there is any way to guarantee that every request posted
to the SRQ is taken by a QP.  You just have to make sure that you
tear down every QP as above, and then you know when there are
no more QPs associated to the SRQ that any receive requests that
haven't completed are still on the SRQ, and you can free their
resources after you destroy the SRQ.

 - R.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html