[PATCH 00/11] Critical bug fixes for RDMA/cxgb4

2013-01-07 Thread Vipul Pandya
Hi Roland, This patch series fixes critical bugs for RDMA/cxgb4. It fixes bugs in following areas: - Aborts connection in error scenarios - Logs only critical errors - Holds the reference of the QP untill TID is released - Avoids race condition in endpoint timeout - Fixes reconnect and version

[PATCH 02/11] RDMA/cxgb4: abort connections when moving to ERROR state.

2013-01-07 Thread Vipul Pandya
If a FINI operation fails, then we need to ABORT instead of CLOSE. Also, if we ABORT due to unexpected STREAMING data, then wake up anybody blocked in FINI... Signed-off-by: Vipul Pandya vi...@chelsio.com --- drivers/infiniband/hw/cxgb4/cm.c |1 + drivers/infiniband/hw/cxgb4/qp.c |1 +

[PATCH 03/11] RDMA/cxgb4: Display streaming mode error only if detected in RTS.

2013-01-07 Thread Vipul Pandya
With later firmware, the chances of getting streaming mode data after we exit RTS is likely, so we don't need to warn for it. The only real case where we don't expect it is when the QP is in RTS. move QP to ERROR when streaming mode data received. Signed-off-by: Vipul Pandya vi...@chelsio.com

[PATCH 05/11] RDMA/cxgb4: Always log async errors.

2013-01-07 Thread Vipul Pandya
Log AEs even if the QP isn't in RTS. It is useful information. Signed-off-by: Vipul Pandya vi...@chelsio.com --- drivers/infiniband/hw/cxgb4/cm.c |6 +++--- drivers/infiniband/hw/cxgb4/ev.c |8 +--- 2 files changed, 8 insertions(+), 6 deletions(-) diff --git

[PATCH 06/11] RDMA/cxgb4: only log rx_data warnings if cpl status is non zero.

2013-01-07 Thread Vipul Pandya
With newer firmware, we can get streaming data due to connection errors before the driver moves the QP out of RTS. Signed-off-by: Vipul Pandya vi...@chelsio.com --- drivers/infiniband/hw/cxgb4/cm.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git

[PATCH 07/11] RDMA/cxgb4: endpoint timeout race condition

2013-01-07 Thread Vipul Pandya
The endpoint timeout logic had a race that could cause an endpoint object to be freed while it was still on the timedout list. This can happen if the timer is stopped after it had fired, but before the timedout thread processed the endpoint timeout. Signed-off-by: Vipul Pandya vi...@chelsio.com

[PATCH 08/11] RDMA/cxgb4: don't reconnect on abort for mpa_rev 1

2013-01-07 Thread Vipul Pandya
only reconnect if the endpoint wasn't freed. peer_abort() should only attempt to reconnect if the endpoint wasn't freed. Also remove hwtid from the debugfs idr. Add missing check for peer2peer in MPAv2 code use correct mpa version on reject. Signed-off-by: Vipul Pandya vi...@chelsio.com ---

[PATCH 09/11] RDMA/cxgb4: Don't wakeup threads for MPAv2

2013-01-07 Thread Vipul Pandya
Don't wakeup threads blocked in rdma_init/rdma_fini if we are on MPAv2, and want to retry connection with MPAv1. Stop ep-timer on getting MPA version mismatch, before doing the abort_connection - in process_mpa_request. Take care to stop ep-timer in error paths for process_mpa_request.

[PATCH 10/11] RDMA/cxgb4: Insert hwtid in pass_accept_req instead in pass_establish

2013-01-07 Thread Vipul Pandya
CPL_ABORT_REQ_RSS can come before TCP connection is established. In such case peer_abort was trying to remove the hwtid which was not inserted. To avoid this we insert the hwtid when we are sure that we are surely going to send passive accept request. Signed-off-by: Vipul Pandya vi...@chelsio.com

[PATCH 11/11] RDMA/cxgb4: Address sparse warnings

2013-01-07 Thread Vipul Pandya
It fixes following types of sparse warnings - cast to pointer from integer of different size - cast from pointer to integer of different size - incorrect type in assignment (different base types) - incorrect type in argument 1 (different base types) - cast from restricted __be64 - cast from

Re: [PATCH 00/11] Critical bug fixes for RDMA/cxgb4

2013-01-07 Thread Steve Wise
Reviewed-by: Steve Wise sw...@opengridcomputing.com -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/11] Critical bug fixes for RDMA/cxgb4

2013-01-07 Thread Steve Wise
And: Acked-by: Steve Wise sw...@opengridcomputing.com Not sure which one I should be using :) On 1/7/2013 9:44 AM, Steve Wise wrote: Reviewed-by: Steve Wise sw...@opengridcomputing.com -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to

Re: [PATCH] IB/srp: disconnect to SRP target before removing SCSI host

2013-01-07 Thread David Dillow
On Mon, 2013-01-07 at 06:34 -0500, Bart Van Assche wrote: Sorry but this patch looks wrong to me, and that because of the following reasons: - A root cause analysis is missing. It has been mentioned in the patch description that device_del() did hang but an analysis of why that hang

bug in ucma_accept()?

2013-01-07 Thread Steve Wise
Hey Sean, Is this a bug? I think it is... diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c index 2709ff5..fb24f05 100644 --- a/drivers/infiniband/core/ucma.c +++ b/drivers/infiniband/core/ucma.c @@ -806,8 +806,13 @@ static ssize_t ucma_accept(struct ucma_file *file,

RE: bug in ucma_accept()?

2013-01-07 Thread Hefty, Sean
Is this a bug? I think it is... diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c index 2709ff5..fb24f05 100644 --- a/drivers/infiniband/core/ucma.c +++ b/drivers/infiniband/core/ucma.c @@ -806,8 +806,13 @@ static ssize_t ucma_accept(struct ucma_file *file,

Re: bug in ucma_accept()?

2013-01-07 Thread Steve Wise
On 1/7/2013 1:22 PM, Hefty, Sean wrote: Is this a bug? I think it is... diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c index 2709ff5..fb24f05 100644 --- a/drivers/infiniband/core/ucma.c +++ b/drivers/infiniband/core/ucma.c @@ -806,8 +806,13 @@ static ssize_t

[PATCH 1/3] opensm/osm_torus.c: Consolidate some parsing with parse_unsigned

2013-01-07 Thread Hal Rosenstock
Signed-off-by: Jim Schutt jasc...@sandia.gov Signed-off-by: Hal Rosenstock h...@mellanox.com --- diff --git a/opensm/osm_torus.c b/opensm/osm_torus.c index 1d847b3..ff83edb 100644 --- a/opensm/osm_torus.c +++ b/opensm/osm_torus.c @@ -853,14 +853,14 @@ out: } static -bool parse_port(unsigned

[PATCH 2/3] opensm/torus: Add configuration for max_changes to report

2013-01-07 Thread Hal Rosenstock
Rather than hard coded constant of 32 for max torus changes to be reported, allow this to be configured with max_changes parameter in torus conf file. Default for max_changes parameter is same as hard coded constant (32). Also, update torus conf documentation for this new parameter.

[PATCH 3/3] opensm/osm_torus.c: Dump torus when OSM_LOG_ROUTING specified

2013-01-07 Thread Hal Rosenstock
Useful feature for torus debug Also, in report_torus_changes, no need for NULL pointer check on nt Reviewed-by: Jim Schutt jasc...@sandia.gov Signed-off-by: Hal Rosenstock h...@mellanox.com --- diff --git a/opensm/osm_torus.c b/opensm/osm_torus.c index 1d847b3..4e5688f 100644 ---

[PATCH][TRIVIAL] opensm/include/complib/cl_packon.h: Fix some commentary typos

2013-01-07 Thread Hal Rosenstock
Signed-off-by: Hal Rosenstock h...@mellanox.com --- diff --git a/include/complib/cl_packon.h b/include/complib/cl_packon.h index ffc8e11..e2e45b4 100644 --- a/include/complib/cl_packon.h +++ b/include/complib/cl_packon.h @@ -55,14 +55,14 @@ * not align properly for some platforms. Care must

Re: [PATCH 00/11] Critical bug fixes for RDMA/cxgb4

2013-01-07 Thread Roland Dreier
On Mon, Jan 7, 2013 at 5:11 AM, Vipul Pandya vi...@chelsio.com wrote: This patch series fixes critical bugs for RDMA/cxgb4. It fixes bugs in following areas: - Aborts connection in error scenarios - Logs only critical errors - Holds the reference of the QP untill TID is released - Avoids

Re: [PATCH 00/11] Critical bug fixes for RDMA/cxgb4

2013-01-07 Thread Vipul Pandya
On 08-01-2013 06:03, Roland Dreier wrote: On Mon, Jan 7, 2013 at 5:11 AM, Vipul Pandya vi...@chelsio.com wrote: This patch series fixes critical bugs for RDMA/cxgb4. It fixes bugs in following areas: - Aborts connection in error scenarios - Logs only critical errors - Holds the reference