Re: [PATCH v1 1/3] IB/srp: Fix crash when unmapping data loop

2014-03-07 Thread Bart Van Assche
On 03/06/14 17:10, Sagi Grimberg wrote: So I took Roland latest 3.14-rc1 and tried to reproduce this issue using HCA with no FMRs support and was *NOT* able to reproduce this issue. This issue reproduced for me on RH6 backported srp and I can't tell where is the delta at the moment. Perhaps

[PATCHv4 net-next 00/32] Misc. fixes for cxgb4 and iw_cxgb4

2014-03-07 Thread Hariprasad Shenai
Hi All, This patch series provides miscelleneous fixes for Chelsio T4/T5 adapters related to cxgb4 related to sge and mtu. And includes DB Drop avoidance and other misc. fixes on iw-cxgb4. The patches series is created against David Miller's 'net-next' tree. And includes patches on cxgb4 and

[PATCHv4 net-next 06/32] cxgb4: Calculate len properly for LSO path

2014-03-07 Thread Hariprasad Shenai
From: Kumar Sanghvi kuma...@chelsio.com Commit 0034b29 (cxgb4: Don't assume LSO only uses SGL path in t4_eth_xmit()) introduced a regression where-in length was calculated wrongly for LSO path, causing chip hangs. So, correct the calculation of len. Fixes: 0034b29 (cxgb4: Don't assume LSO only

[PATCHv4 net-next 02/32] cxgb4: Add code to dump SGE registers when hitting idma hangs

2014-03-07 Thread Hariprasad Shenai
From: Kumar Sanghvi kuma...@chelsio.com Based on original work by Casey Leedom lee...@chelsio.com Signed-off-by: Kumar Sanghvi kuma...@chelsio.com --- drivers/net/ethernet/chelsio/cxgb4/cxgb4.h | 1 + drivers/net/ethernet/chelsio/cxgb4/t4_hw.c | 106 +++

[PATCHv4 net-next 10/32] iw_cxgb4: Treat CPL_ERR_KEEPALV_NEG_ADVICE as negative advice

2014-03-07 Thread Hariprasad Shenai
From: Steve Wise sw...@opengridcomputing.com Based on original work by Anand Priyadarshee ana...@chelsio.com. Signed-off-by: Steve Wise sw...@opengridcomputing.com --- drivers/infiniband/hw/cxgb4/cm.c| 24 drivers/net/ethernet/chelsio/cxgb4/t4_msg.h | 1 +

[PATCHv4 net-next 09/32] iw_cxgb4: release neigh entry in error paths

2014-03-07 Thread Hariprasad Shenai
From: Steve Wise sw...@opengridcomputing.com Always release the neigh entry in rx_pkt(). Based on original work by Santosh Rastapur sant...@chelsio.com. Signed-off-by: Steve Wise sw...@opengridcomputing.com --- drivers/infiniband/hw/cxgb4/cm.c | 5 +++-- 1 file changed, 3 insertions(+), 2

[PATCHv4 net-next 05/32] cxgb4: use spinlock_irqsave/spinlock_irqrestore for db lock

2014-03-07 Thread Hariprasad Shenai
From: Kumar Sanghvi kuma...@chelsio.com Currently ring_tx_db() can deadlock if a db_full interrupt fires and is run on the same while ring_tx_db() has the db lock held. It needs to disable interrupts since it serializes with an interrupt handler. Based on original work by Steve Wise

[PATCHv4 net-next 01/32] cxgb4: Fix some small bugs in t4_sge_init_soft() when our Page Size is 64KB

2014-03-07 Thread Hariprasad Shenai
From: Kumar Sanghvi kuma...@chelsio.com We'd come in with SGE_FL_BUFFER_SIZE[0] and [1] both equal to 64KB and the extant logic would flag that as an error. Based on original work by Casey Leedom lee...@chelsio.com Signed-off-by: Kumar Sanghvi kuma...@chelsio.com ---

[PATCHv4 net-next 03/32] cxgb4: Rectify emitting messages about SGE Ingress DMA channels being potentially stuck

2014-03-07 Thread Hariprasad Shenai
From: Kumar Sanghvi kuma...@chelsio.com Based on original work by Casey Leedom lee...@chelsio.com Signed-off-by: Kumar Sanghvi kuma...@chelsio.com --- drivers/net/ethernet/chelsio/cxgb4/cxgb4.h | 9 ++- drivers/net/ethernet/chelsio/cxgb4/sge.c | 90 -- 2 files

[PATCHv4 net-next 27/32] iw_cxgb4: rmb() after reading valid gen bit

2014-03-07 Thread Hariprasad Shenai
From: Steve Wise sw...@opengridcomputing.com Some HW platforms can reorder read operations, so we must rmb() after we see a valid gen bit in a CQE but before we read any other fields from the CQE. Signed-off-by: Steve Wise sw...@opengridcomputing.com --- drivers/infiniband/hw/cxgb4/t4.h | 1 +

[PATCHv4 net-next 28/32] iw_cxgb4: wc_wmb() needed after DB writes

2014-03-07 Thread Hariprasad Shenai
From: Steve Wise sw...@opengridcomputing.com Need to do an sfence after both the WC and regular PIDX DB write. Otherwise the host might reorder things and cause work request corruption (seen with NFSRDMA). Signed-off-by: Steve Wise sw...@opengridcomputing.com ---

[PATCHv4 net-next 04/32] cxgb4: Updates for T5 SGE's Egress Congestion Threshold

2014-03-07 Thread Hariprasad Shenai
From: Kumar Sanghvi kuma...@chelsio.com Based on original work by Casey Leedom lee...@chelsio.com Signed-off-by: Kumar Sanghvi kuma...@chelsio.com --- drivers/net/ethernet/chelsio/cxgb4/sge.c | 18 +- drivers/net/ethernet/chelsio/cxgb4/t4_regs.h | 6 ++ 2 files changed,

[PATCHv4 net-next 07/32] iw_cxgb4: cap CQ size at T4_MAX_IQ_SIZE

2014-03-07 Thread Hariprasad Shenai
From: Steve Wise sw...@opengridcomputing.com Signed-off-by: Steve Wise sw...@opengridcomputing.com --- drivers/infiniband/hw/cxgb4/cq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/infiniband/hw/cxgb4/cq.c b/drivers/infiniband/hw/cxgb4/cq.c index 88de3aa..c0673ac

[PATCHv4 net-next 18/32] iw_cxgb4: fix possible memory leak in RX_PKT processing

2014-03-07 Thread Hariprasad Shenai
From: Steve Wise sw...@opengridcomputing.com If cxgb4_ofld_send() returns 0, then send_fw_pass_open_req() must free the request skb and the saved skb with the tcp header. Signed-off-by: Steve Wise sw...@opengridcomputing.com --- drivers/infiniband/hw/cxgb4/cm.c | 9 - 1 file changed, 8

[PATCHv4 net-next 23/32] iw_cxgb4: lock around accept/reject downcalls

2014-03-07 Thread Hariprasad Shenai
From: Steve Wise sw...@opengridcomputing.com There is a race between ULP threads doing an accept/reject, and the ingress processing thread handling close/abort for the same connection. The accept/reject path needs to hold the lock to serialize these paths. Signed-off-by: Steve Wise

[PATCHv4 net-next 24/32] iw_cxgb4: drop RX_DATA packets if the endpoint is gone

2014-03-07 Thread Hariprasad Shenai
From: Steve Wise sw...@opengridcomputing.com Signed-off-by: Steve Wise sw...@opengridcomputing.com --- drivers/infiniband/hw/cxgb4/cm.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c index f4a9ebe..e7c6dfd 100644 ---

[PATCHv4 net-next 12/32] iw_cxgb4: use the BAR2/WC path for kernel QPs and T5 devices

2014-03-07 Thread Hariprasad Shenai
From: Steve Wise sw...@opengridcomputing.com Signed-off-by: Steve Wise sw...@opengridcomputing.com --- drivers/infiniband/hw/cxgb4/device.c | 41 +- drivers/infiniband/hw/cxgb4/iw_cxgb4.h | 2 ++ drivers/infiniband/hw/cxgb4/qp.c | 59 +---

[PATCHv4 net-next 21/32] iw_cxgb4: adjust tcp snd/rcv window based on link speed

2014-03-07 Thread Hariprasad Shenai
From: Steve Wise sw...@opengridcomputing.com 40G devices need a bigger windows, so default 40G devices to snd 512K rcv 1024K. Fixed a bug that shows up with recv window sizes that exceed the size of the RCV_BUFSIZ field in opt0 (= 1024K :). If the recv window exceeds this, then we specify the

[PATCHv4 net-next 17/32] iw_cxgb4: don't leak skb in c4iw_uld_rx_handler()

2014-03-07 Thread Hariprasad Shenai
From: Steve Wise sw...@opengridcomputing.com Signed-off-by: Steve Wise sw...@opengridcomputing.com --- drivers/infiniband/hw/cxgb4/device.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/cxgb4/device.c b/drivers/infiniband/hw/cxgb4/device.c index

[PATCHv4 net-next 25/32] iw_cxgb4: rx_data() needs to hold the ep mutex

2014-03-07 Thread Hariprasad Shenai
From: Steve Wise sw...@opengridcomputing.com To avoid racing with other threads doing close/flush/whatever, rx_data() should hold the endpoint mutex. Signed-off-by: Steve Wise sw...@opengridcomputing.com --- drivers/infiniband/hw/cxgb4/cm.c | 16 +--- 1 file changed, 9

[PATCHv4 net-next 11/32] cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes

2014-03-07 Thread Hariprasad Shenai
From: Steve Wise sw...@opengridcomputing.com The current logic suffers from a slow response time to disable user DB usage, and also fails to avoid DB FIFO drops under heavy load. This commit fixes these deficiencies and makes the avoidance logic more optimal. This is done by more efficiently

[PATCHv4 net-next 15/32] iw_cxgb4: default peer2peer mode to 1

2014-03-07 Thread Hariprasad Shenai
From: Steve Wise sw...@opengridcomputing.com Signed-off-by: Steve Wise sw...@opengridcomputing.com --- drivers/infiniband/hw/cxgb4/cm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c index

[PATCHv4 net-next 32/32] iw_cxgb4: Use pr_warn_ratelimited

2014-03-07 Thread Hariprasad Shenai
Signed-off-by: Hariprasad Shenai haripra...@chelsio.com --- drivers/infiniband/hw/cxgb4/resource.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/cxgb4/resource.c b/drivers/infiniband/hw/cxgb4/resource.c index d9bc9ba..67df71a 100644 ---

[PATCHv4 net-next 22/32] iw_cxgb4: update snd_seq when sending MPA messages

2014-03-07 Thread Hariprasad Shenai
From: Steve Wise sw...@opengridcomputing.com Signed-off-by: Steve Wise sw...@opengridcomputing.com --- drivers/infiniband/hw/cxgb4/cm.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c index f6891b8..95b3c01 100644 ---

[PATCHv4 net-next 26/32] iw_cxgb4: endpoint timeout fixes

2014-03-07 Thread Hariprasad Shenai
From: Steve Wise sw...@opengridcomputing.com 1) timedout endpoint processing can be starved. If there is continual CPL messages flowing into the driver, the endpoint timeout processing can be starved. This condition exposed the other bugs below. Solution: In process_work(), call

[PATCHv4 net-next 30/32] iw_cxgb4: minor fixes

2014-03-07 Thread Hariprasad Shenai
From: Steve Wise sw...@opengridcomputing.com Added some missing debug stats. Use uninitialized_var(). Initialize reserved fields in a FW work request. Signed-off-by: Steve Wise sw...@opengridcomputing.com --- drivers/infiniband/hw/cxgb4/cq.c | 2 +- drivers/infiniband/hw/cxgb4/mem.c

[PATCHv4 net-next 29/32] iw_cxgb4: SQ flush fix

2014-03-07 Thread Hariprasad Shenai
From: Steve Wise sw...@opengridcomputing.com There is a race when moving a QP from RTS-CLOSING where a SQ work request could be posted after the FW receives the RDMA_RI/FINI WR. The SQ work request will never get processed, and should be completed with FLUSHED status. Function c4iw_flush_sq(),

[PATCHv4 net-next 13/32] iw_cxgb4: Fix incorrect BUG_ON conditions

2014-03-07 Thread Hariprasad Shenai
From: Steve Wise sw...@opengridcomputing.com Based on original work from Jay Hernandez j...@chelsio.com Signed-off-by: Steve Wise sw...@opengridcomputing.com --- drivers/infiniband/hw/cxgb4/cq.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git

[PATCHv4 net-next 08/32] iw_cxgb4: Allow loopback connections

2014-03-07 Thread Hariprasad Shenai
From: Steve Wise sw...@opengridcomputing.com find_route() must treat loopback as a valid egress interface. Signed-off-by: Steve Wise sw...@opengridcomputing.com --- drivers/infiniband/hw/cxgb4/cm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git

[PATCHv4 net-next 20/32] iw_cxgb4: connect_request_upcall fixes

2014-03-07 Thread Hariprasad Shenai
From: Steve Wise sw...@opengridcomputing.com When processing an MPA Start Request, if the listening endpoint is DEAD, then abort the connection. If the IWCM returns an error, then we must abort the connection and release resources. Also abort_connection() should not post a CLOSE event, so clean

[PATCHv4 net-next 14/32] iw_cxgb4: Mind the sq_sig_all/sq_sig_type QP attributes

2014-03-07 Thread Hariprasad Shenai
From: Steve Wise sw...@opengridcomputing.com Signed-off-by: Steve Wise sw...@opengridcomputing.com --- drivers/infiniband/hw/cxgb4/iw_cxgb4.h | 1 + drivers/infiniband/hw/cxgb4/qp.c | 6 -- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git

[PATCHv4 net-next 19/32] iw_cxgb4: ignore read reponse type 1 CQEs

2014-03-07 Thread Hariprasad Shenai
From: Steve Wise sw...@opengridcomputing.com These are generated by HW in some error cases and need to be silently discarded. Signed-off-by: Steve Wise sw...@opengridcomputing.com --- drivers/infiniband/hw/cxgb4/cq.c | 24 1 file changed, 20 insertions(+), 4

[PATCHv4 net-next 16/32] iw_cxgb4: save the correct map length for fast_reg_page_lists

2014-03-07 Thread Hariprasad Shenai
From: Steve Wise sw...@opengridcomputing.com We cannot save the mapped length using the rdma max_page_list_len field of the ib_fast_reg_page_list struct because the core code uses it. This results in an incorrect unmap of the page list in c4iw_free_fastreg_pbl(). I found this with dma map

[PATCH] IB/qib: Fix debugfs ordering issue with multiple HCAs

2014-03-07 Thread Mike Marciniszyn
The debugfs init code was incorrectly called before the idr mechanism is used to get the unit number so the dd-unit hadn't been initialized. This caused the unit relative directory creation to fail after the first. This patch moves the init for the debugfs stuff until after all of the failures

[PATCH V2 0/2] qib percpu counters

2014-03-07 Thread Mike Marciniszyn
The following series implements percpu counters for high frequency counts. This version of the patch deals with conflicts in qib_init.c with stable patch per http://marc.info/?l=linux-rdmam=139419915606504w=2 --- Mike Marciniszyn (2): IB/qib: Add percpu counter replacing qib_devdata

[PATCH V2 1/2] IB/qib: Add percpu counter replacing qib_devdata int_counter

2014-03-07 Thread Mike Marciniszyn
This patch replaces the dd-int_counter with a percpu counter. The maintanance of qib_stats.sps_ints and int_counter are combined into the new counter. There are two new functions added to read the counter: - qib_int_counter (for a particular qib_devdata) - qib_sps_ints (for all HCAs) A

[PATCH V2 2/2] IB/qib: Modify software pma counters to use percpu variables

2014-03-07 Thread Mike Marciniszyn
The counters, unicast_xmit, unicast_rcv, multicast_xmit, multicast_rcv are now maintained as percpu variables. The mad code is modified to add a z_ latch so that the percpu counters monotonically increase with appropriate adjustments in the reset, read logic to maintain the z_ latch. This patch

RE: NFS over RDMA crashing

2014-03-07 Thread Steve Wise
Resurrecting an old issue :) More inline below... -Original Message- From: linux-nfs-ow...@vger.kernel.org [mailto:linux-nfs- ow...@vger.kernel.org] On Behalf Of J. Bruce Fields Sent: Thursday, February 07, 2013 10:42 AM To: Yan Burman Cc: linux-...@vger.kernel.org;

Re: linux rdma 3.14 merge plans

2014-03-07 Thread Roland Dreier
Sure, no problem. Do you have a git tree with the latest versions of all the changes you want for 3.15 in a branch? That would be helpful as I catch up on applying things, so that I don't miss anything. If you don't have one, taking a little time to set one up on github or wherever would be

Re: [PATCH v5 00/10] Introduce Signature feature

2014-03-07 Thread Roland Dreier
So I went ahead and applied this for 3.15, although I suspect the verbs API is probably the wrong one. I understand that the mlx5 microarchitecture requires some of this signature binding stuff to go through a work queue, but conceptually I don't think the IB_WR_REG_SIG_MR work request makes

RE: NFS over RDMA crashing

2014-03-07 Thread Steve Wise
Does this help? They must have added this for some reason, but I'm not seeing how it could have ever done anything --b. diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c index 0ce7552..e8f25ec 100644 ---

[PATCH] IB/mlx5_core: remove unreachable function call in module init

2014-03-07 Thread Kleber Sacilotto de Souza
The call to mlx5_health_cleanup() in the module init function can never be reached. Removing it. Signed-off-by: Kleber Sacilotto de Souza kleb...@linux.vnet.ibm.com --- drivers/net/ethernet/mellanox/mlx5/core/main.c |1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git

Re: [PATCHv4 net-next 05/32] cxgb4: use spinlock_irqsave/spinlock_irqrestore for db lock

2014-03-07 Thread David Miller
From: Hariprasad Shenai haripra...@chelsio.com Date: Fri, 7 Mar 2014 16:03:02 +0530 @@ -3585,9 +3585,11 @@ static void disable_txq_db(struct sge_txq *q) static void enable_txq_db(struct sge_txq *q) { - spin_lock_irq(q-db_lock); + unsigned long flags; + +