Re: kernel memory registration

2015-07-14 Thread Sagi Grimberg
Having a few schemes availabe in the core code that the driver can chose from seems like a much more sensible option. I think that makes sense, but several of the schemes we are working with are effectively single-vendor schemes. Indirect MR and DIX are good examples of things that only one

Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags

2015-07-14 Thread Sagi Grimberg
On 7/13/2015 11:15 PM, Jason Gunthorpe wrote: On Mon, Jul 13, 2015 at 03:36:44PM -0400, Tom Talpey wrote: On 7/11/2015 6:25 AM, 'Christoph Hellwig' wrote: I think what we need to support for now is FRMR as the primary target, and FMR as a secondar[y]. FMR is a *very* bad choice, for several

Re: Kernel fast memory registration API proposal [RFC]

2015-07-14 Thread Sagi Grimberg
On 7/13/2015 7:30 PM, Jason Gunthorpe wrote: On Fri, Jul 10, 2015 at 12:09:37PM +0300, Sagi Grimberg wrote: Given the last discussions on our in-kernel memory registration API I thought I'd propose another approach to address this. I assume you can put your new indirect registrations under

Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags

2015-07-14 Thread Sagi Grimberg
On 7/14/2015 10:37 AM, 'Christoph Hellwig' wrote: On Mon, Jul 13, 2015 at 03:36:44PM -0400, Tom Talpey wrote: On 7/11/2015 6:25 AM, 'Christoph Hellwig' wrote: I think what we need to support for now is FRMR as the primary target, and FMR as a secondar[y]. FMR is a *very* bad choice, for

Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags

2015-07-14 Thread Sagi Grimberg
On 7/14/2015 10:25 AM, 'Christoph Hellwig' wrote: On Mon, Jul 13, 2015 at 10:57:48AM -0600, Jason Gunthorpe wrote: Currently various drivers are using ib_get_dma_mr with remote flags unfortunately, e.g. the SRP initiator driver uses it to optimize away memory registrtions for single SGL entry

Re: Kernel fast memory registration API proposal [RFC]

2015-07-14 Thread Sagi Grimberg
On 7/13/2015 5:16 PM, Chuck Lever wrote: NFS really should be using something more similar to a scatterlist, as it maps pretty well to the sk_frags in the network layer as well. Struct scatterlist is imprtant because it's the way the DMA mapping functions takes a multi-page argument, so ayone

Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags

2015-07-14 Thread Sagi Grimberg
On 7/14/2015 3:12 PM, Tom Talpey wrote: On 7/14/2015 5:22 AM, Sagi Grimberg wrote: On 7/14/2015 10:37 AM, 'Christoph Hellwig' wrote: On Mon, Jul 13, 2015 at 03:36:44PM -0400, Tom Talpey wrote: On 7/11/2015 6:25 AM, 'Christoph Hellwig' wrote: I think what we need to support for now is FRMR

Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags

2015-07-14 Thread Sagi Grimberg
On 7/14/2015 3:24 PM, Tom Talpey wrote: On 7/14/2015 4:06 AM, Sagi Grimberg wrote: All protocols cares about transferring data and sending messages, so it's not a good enough reason for a poor registration method choice. This just emphasizes why we need to converge to a single method. In my

Re: Kernel fast memory registration API proposal [RFC]

2015-07-14 Thread Sagi Grimberg
On 7/14/2015 6:33 PM, Christoph Hellwig wrote: On Tue, Jul 14, 2015 at 11:39:24AM +0300, Sagi Grimberg wrote: This is exactly what I don't want to do. I don't think that implicit posting is a good idea for reasons that I mentioned earlier: This is where I have a problem. Providing an API

Re: Kernel fast memory registration API proposal [RFC]

2015-07-14 Thread Sagi Grimberg
I'm really disappointed by the negative emails on this subject.. Jason, I'm really not trying to be negative. I'm hearing you out, and I agree with a lot of what you have to say. I just don't agree with all of it. You are right, ULPs do the same thing, the same wrong thing of maintaining a

Re: Kernel fast memory registration API proposal [RFC]

2015-07-14 Thread Sagi Grimberg
On 7/14/2015 7:35 PM, Jason Gunthorpe wrote: On Tue, Jul 14, 2015 at 07:12:01PM +0300, Sagi Grimberg wrote: The ULP doesn't care if it needs to reserver the slot, and it generally doesn't care about the notification either unless it needs to handle an error. That's generally correct

Re: Kernel fast memory registration API proposal [RFC]

2015-07-16 Thread Sagi Grimberg
On 7/16/2015 11:07 AM, Christoph Hellwig wrote: On Thu, Jul 16, 2015 at 09:52:44AM +0300, Sagi Grimberg wrote: I suggest to start with what I proposed. And in a later stage (if we still think its needed) we can have a higher level API that hides the post, something like: rdma_reg_sg(struct

Re: Kernel fast memory registration API proposal [RFC]

2015-07-16 Thread Sagi Grimberg
On 7/15/2015 5:32 PM, Chuck Lever wrote: On Jul 15, 2015, at 4:01 AM, Sagi Grimberg sa...@dev.mellanox.co.il wrote: On 7/14/2015 8:09 PM, Jason Gunthorpe wrote: On Tue, Jul 14, 2015 at 07:55:39PM +0300, Sagi Grimberg wrote: But, if people think that it's better to have an API that does

Re: Kernel fast memory registration API proposal [RFC]

2015-07-16 Thread Sagi Grimberg
On 7/15/2015 8:07 PM, Jason Gunthorpe wrote: On Wed, Jul 15, 2015 at 12:32:33AM -0700, Christoph Hellwig wrote: int rdma_create_mr(struct ib_pd *pd, enum rdma_mr_type mr, u32 max_pages, int flags); * array from a SG list * @mr: memory region * @sg: sg

Re: Kernel fast memory registration API proposal [RFC]

2015-07-16 Thread Sagi Grimberg
I can drop it, unless anyone can think of a use-case where a ULP would want to register a region with a different offset from sg[0]-offset and/or ends before the sum(sg-length). What if the sg list has to be chunked up due to the device's FRWR pbl depth limits? Or is that handled underneath

Re: Kernel fast memory registration API proposal [RFC]

2015-07-15 Thread Sagi Grimberg
On 7/14/2015 8:09 PM, Jason Gunthorpe wrote: On Tue, Jul 14, 2015 at 07:55:39PM +0300, Sagi Grimberg wrote: But, if people think that it's better to have an API that does implicit posting always without notification, and then silently consume error or flush completions. I can try and look

Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags

2015-07-15 Thread Sagi Grimberg
On 7/14/2015 11:29 PM, Jason Gunthorpe wrote: On Tue, Jul 14, 2015 at 12:55:11PM -0700, 'Christoph Hellwig' wrote: On Tue, Jul 14, 2015 at 02:32:31PM -0500, Steve Wise wrote: You mean should not, yea? Ok. I'll check for iWARP. But don't tell me to remove the transport-specific hacks in

Re: Kernel fast memory registration API proposal [RFC]

2015-07-15 Thread Sagi Grimberg
On 7/15/2015 10:32 AM, Christoph Hellwig wrote: Hi Sagi, I went over your proposal based on reviewing the ongoing MR threads and my implementation of a similar in-driver abstraction, so here are some proposed updates. struct provider_mr { u64 *page_list; // or what ever

Re: Kernel fast memory registration API proposal [RFC]

2015-07-15 Thread Sagi Grimberg
On 7/15/2015 6:05 AM, Doug Ledford wrote: On 07/14/2015 01:08 PM, Jason Gunthorpe wrote: On Tue, Jul 14, 2015 at 07:46:50PM +0300, Sagi Grimberg wrote: Which drivers doesn't support FRWR that we need to do other things? ipath - depracated We have permission to move this to staging

Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags

2015-07-15 Thread Sagi Grimberg
On 7/14/2015 8:26 PM, Jason Gunthorpe wrote: On Tue, Jul 14, 2015 at 12:05:53PM +0300, Sagi Grimberg wrote: iser has it too. I have a similar patch with a flag for iser (its behind a bulk of patches that are still pending though). Do we all agree and understand that stuff like

Re: [Ksummit-discuss] [TECH TOPIC] IRQ affinity

2015-07-15 Thread Sagi Grimberg
On 7/15/2015 8:25 PM, Jens Axboe wrote: On 07/15/2015 11:19 AM, Keith Busch wrote: On Wed, 15 Jul 2015, Bart Van Assche wrote: * With blk-mq and scsi-mq optimal performance can only be achieved if the relationship between MSI-X vector and NUMA node does not change over time. This is

Re: Kernel fast memory registration API proposal [RFC]

2015-07-18 Thread Sagi Grimberg
/** * ib_mr_set_sg() - populate memory region buffers * array from a SG list * @mr: memory region * @sg: sg list * @sg_nents:number of elements in the sg * * Can fail if the HW is not able to register this * sg list. In case of failure - caller

Re: Kernel fast memory registration API proposal [RFC]

2015-07-18 Thread Sagi Grimberg
On 7/16/2015 9:08 PM, Jason Gunthorpe wrote: On Thu, Jul 16, 2015 at 03:21:04PM +0300, Sagi Grimberg wrote: I gotta say, these suggestions of bool/write or supported_ops with a convert helper seem (to me at least) to make things more complicated. Why not just set the the access_flags

Re: RFC: Immediate data support for SRP

2015-07-19 Thread Sagi Grimberg
On 7/16/2015 6:25 PM, Bart Van Assche wrote: Hello, Hi Bart, I agree it would definitely help as the lack of immediate data emphasizes the additional latency of doing rdma reads. As you probably know for write requests immediate data means sending the data in the same packet as the write

Re: RFC: Immediate data support for SRP

2015-07-20 Thread Sagi Grimberg
On 7/20/2015 12:43 AM, Or Gerlitz wrote: On Sun, Jul 19, 2015 at 7:07 PM, Sagi Grimberg sa...@dev.mellanox.co.il wrote: On 7/16/2015 6:25 PM, Bart Van Assche wrote: I agree it would definitely help as the lack of immediate data emphasizes the additional latency of doing rdma reads. Sagi

Re: Kernel fast memory registration API proposal [RFC]

2015-07-12 Thread Sagi Grimberg
On 7/11/2015 1:39 PM, Christoph Hellwig wrote: On Fri, Jul 10, 2015 at 12:09:37PM +0300, Sagi Grimberg wrote: And then provide helpers to populate the MR with generic kernel structures such as struct scatterlist (for scsi and other ULPs), struct page (for NFS) or struct bio_vec (for block ULPs

[PATCH RFC] svcrdma: Fix possible over population fast_reg_page_list

2015-07-20 Thread Sagi Grimberg
When accounting the needed_pages, we need to look into the page_list-max_page_list_len and not the global context xprt-sc_frmr_pg_list_len. Signed-off-by: Sagi Grimberg sa...@mellanox.com --- net/sunrpc/xprtrdma/svc_rdma_recvfrom.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions

Re: [PATCH RFC] svcrdma: Fix possible over population fast_reg_page_list

2015-07-20 Thread Sagi Grimberg
On 7/20/2015 8:13 PM, Chuck Lever wrote: On Jul 20, 2015, at 1:00 PM, Sagi Grimberg sa...@mellanox.com wrote: When accounting the needed_pages, we need to look into the page_list-max_page_list_len and not the global context xprt-sc_frmr_pg_list_len. Signed-off-by: Sagi Grimberg sa

Re: Kernel fast memory registration API proposal [RFC]

2015-07-20 Thread Sagi Grimberg
On 7/20/2015 7:23 PM, Jason Gunthorpe wrote: On Sun, Jul 19, 2015 at 08:33:24AM +0300, Sagi Grimberg wrote: I was thinking that the user won't explicitly say which key it registers and it will be decided from the registration itself. Meaning, the registration code will do: Please don't

Re: Kernel fast memory registration API proposal [RFC]

2015-07-20 Thread Sagi Grimberg
On 7/20/2015 8:00 PM, Jason Gunthorpe wrote: On Mon, Jul 20, 2015 at 07:27:52PM +0300, Sagi Grimberg wrote: I'm thinking now that this should have an input argument of block_size. Maybe in the future ULPs would want to register huge pages, it will be a shame to map it into PAGE_SIZE chunks

Re: Kernel fast memory registration API proposal [RFC]

2015-07-20 Thread Sagi Grimberg
I'm thinking now that this should have an input argument of block_size. Maybe in the future ULPs would want to register huge pages, it will be a shame to map it into PAGE_SIZE chunks... Why wouldn't it just transparently support huge pages? sg seems to have enough information. I'm not sure I

[PATCH] mlx5: Fix missing device local_dma_lkey

2015-07-20 Thread Sagi Grimberg
The mlx5 driver exposes device capability IB_DEVICE_LOCAL_DMA_LKEY but does not set the the device local_dma_lkey. This breaks rpcrdma drivers. Query and set this lkey when creating the device resources. Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/hw/mlx5/main.c

Re: [PATCH] mlx5: Fix missing device local_dma_lkey

2015-07-20 Thread Sagi Grimberg
On 7/20/2015 8:08 PM, Chuck Lever wrote: On Jul 20, 2015, at 12:54 PM, Sagi Grimberg sa...@mellanox.com wrote: The mlx5 driver exposes device capability IB_DEVICE_LOCAL_DMA_LKEY but does not set the the device local_dma_lkey. This breaks rpcrdma drivers. Query and set this lkey when creating

[PATCH] mlx5: Expose correct page_size_cap in device attributes

2015-07-21 Thread Sagi Grimberg
Should be all the page sizes that are supported by the device. Reported-by: Jason Gunthorpe jguntho...@obsidianresearch.com Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/hw/mlx5/main.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/drivers

Re: Kernel fast memory registration API proposal [RFC]

2015-07-21 Thread Sagi Grimberg
Bleh... seems like a great effort just to find that out. Isn't it better to just ask for a page_size arg? So who computes page_size and how? Don't just punt things to a caller without really explaining how the caller is supposed to use it correctly. I'd imagine that the ULP knows when it

Re: RFC: Immediate data support for SRP

2015-07-21 Thread Sagi Grimberg
On 7/21/2015 3:03 AM, Bart Van Assche wrote: On 07/19/2015 09:07 AM, Sagi Grimberg wrote: On 7/16/2015 6:25 PM, Bart Van Assche wrote: As you probably know for write requests immediate data means sending the data in the same packet as the write command instead of sending it as a separate

Re: RFC: Immediate data support for SRP

2015-07-21 Thread Sagi Grimberg
So you have 140% better IOPS with immediate-data vs. non immediate data?! numberz? No, the improvement was to avoid memory copy from the pre-posted recieve buffer (with immediate-data) to an allocated buffer. Instead the receive buffer is handed to the backend to do IO. This shows up to 40%

Re: [PATCH v2 for-next 1/7] IB/core: Extend ib_uverbs_create_qp

2015-10-21 Thread Sagi Grimberg
On 10/21/2015 1:04 PM, Or Gerlitz wrote: On 10/21/2015 12:53 PM, Sagi Grimberg wrote: On 10/15/2015 2:44 PM, Eran Ben Elisha wrote: +struct ib_uverbs_ex_create_qp { +__u64 user_handle; +__u32 pd_handle; +__u32 send_cq_handle; +__u32 recv_cq_handle; +__u32 srq_handle

[PATCH 2/2] iser-target: Remove explicit mlx4 work-around

2015-10-27 Thread Sagi Grimberg
The driver now exposes sufficient limits so we can avoid having mlx4 specific work-around. Signed-off-by: Sagi Grimberg <sa...@mellanox.com> --- drivers/infiniband/ulp/isert/ib_isert.c | 10 ++ 1 files changed, 2 insertions(+), 8 deletions(-) diff --git a/drivers/infiniband/ulp

[PATCH 0/2] Expose max_sge_rd correctly

2015-10-27 Thread Sagi Grimberg
This addresses a specific mlx4 issue where the max_sge_rd is actually smaller than max_sge (rdma reads with max_sge entries completes with error). The second patch removes the explicit work-around from the iser target code. This applies on top of Christoph's device attributes modification. Sagi

[PATCH 1/2] mlx4: Expose correct max_sge_rd limit

2015-10-27 Thread Sagi Grimberg
mlx4 devices (ConnectX-2, ConnectX-3) can not issue max_sge in a single RDMA_READ request (resulting in a completion error). Thus, expose lower max_sge_rd to avoid this issue. Signed-off-by: Sagi Grimberg <sa...@mellanox.com> --- drivers/infiniband/hw/mlx4/main.c |3 ++- 1 files chan

[PATCH v2 2/2] iser-target: Remove explicit mlx4 work-around

2015-10-28 Thread Sagi Grimberg
The driver now exposes sufficient limits so we can avoid having mlx4 specific work-around. Signed-off-by: Sagi Grimberg <sa...@mellanox.com> Reviewed-by: Steve Wise <sw...@opengridcomputing.com> --- drivers/infiniband/ulp/isert/ib_isert.c | 13 +++-- 1 files changed,

[PATCH v5 27/26] IB/hfi1: Remove fast registration from the code

2015-10-29 Thread Sagi Grimberg
The driver does not support it anyway, and the support should be added to a generic layer shared by both hfi1, qib and softroce drivers. Signed-off-by: Sagi Grimberg <sa...@mellanox.com> --- drivers/staging/rdma/hfi1/keys.c | 55 - drivers/staging/rdm

[PATCH 28/26] IB/ipath: Remove fast registration from the code

2015-10-29 Thread Sagi Grimberg
The driver does not support it anyway, and the support should be added to a generic layer shared by both hfi1, qib and softroce drivers. Signed-off-by: Sagi Grimberg <sa...@mellanox.com> --- drivers/staging/rdma/ipath/ipath_verbs.c |3 --- drivers/staging/rdma/ipath/ipath_verbs.h |

Re: [PATCH v5 00/26] New fast registration API

2015-10-29 Thread Sagi Grimberg
I can provide a patch for hfi, anything else needed? It breaks all of them in staging, not just hgi1. So, hfi1, amso1100, ipath, and ehca. hfi1: Does not support FRWR at all, there are just some copy-paste sections that supposedly handle it - so I'll drop any sign of it from the code.

Re: [PATCH] IB/mlx: Expose max_fmr to ib_query_device

2015-10-29 Thread Sagi Grimberg
Hi Yuval, The title prefix should be IB/mlx4: Expose max_fmr so it will be available to ULPs. max_fmr is num_mpts minus reserved. Signed-off-by: Yuval Shaia --- drivers/infiniband/hw/mlx4/main.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git

Re: [PATCH 1/2] mlx4: Expose correct max_sge_rd limit

2015-10-27 Thread Sagi Grimberg
Hello Sagi, Is this the same issue as what has been discussed in http://www.spinics.net/lists/linux-rdma/msg21799.html ? Looks like it. I think this patch addresses this issue, but lets CC Eli to comment if I'm missing something. Thanks for digging this up... Sagi. -- To unsubscribe from

Re: [PATCH 1/2] mlx4: Expose correct max_sge_rd limit

2015-10-27 Thread Sagi Grimberg
On 27/10/2015 16:39, Or Gerlitz wrote: On 10/27/2015 11:40 AM, Sagi Grimberg wrote: mlx4 devices (ConnectX-2, ConnectX-3) can not issue max_sge in a single RDMA_READ request (resulting in a completion error). Thus, expose lower max_sge_rd to avoid this issue. Sagi, Hey Or, Still

Re: [PATCH 1/2] mlx4: Expose correct max_sge_rd limit

2015-10-27 Thread Sagi Grimberg
But AFAIR, the magic number was 28... how this goes hand in hand with your findings? mlx4 max_sge is 32, and isert does max_sge - 2 = 30. So it always used 30... and I run it reliably with this for a while now. This thing exists before I was involved so I might not be familiar with all the

[PATCH v1 0/2] Handle mlx4 max_sge_rd correctly

2015-10-27 Thread Sagi Grimberg
and added a root cause analysis to patch change log. - Fixed isert qp creation to be max_sge but construct rdma work request with the minimum of max_sge and max_sge_rd as non-rdma sends (login rsp) take 2 sges (and some devices have max_sge_rd = 1. Sagi Grimberg (2): mlx4: Expose correct

[PATCH v1 2/2] iser-target: Remove explicit mlx4 work-around

2015-10-27 Thread Sagi Grimberg
The driver now exposes sufficient limits so we can avoid having mlx4 specific work-around. Signed-off-by: Sagi Grimberg <sa...@mellanox.com> --- drivers/infiniband/ulp/isert/ib_isert.c | 13 +++-- 1 files changed, 3 insertions(+), 10 deletions(-) diff --git a/drivers/infiniba

[PATCH v1 1/2] mlx4: Expose correct max_sge_rd limit

2015-10-27 Thread Sagi Grimberg
= 30. Signed-off-by: Sagi Grimberg <sa...@mellanox.com> --- drivers/infiniband/hw/mlx4/main.c |2 +- include/linux/mlx4/device.h | 11 +++ 2 files changed, 12 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/

Re: [PATCH] IB/iser: Remove an unused variable

2015-10-27 Thread Sagi Grimberg
Detected this by compiling with W=1. Signed-off-by: Bart Van Assche <bart.vanass...@sandisk.com> Cc: Sagi Grimberg <sa...@mellanox.com> FWIW, Reviewed-by: Sagi Grimberg <sa...@mellanox.com> -- To unsubscribe from this list: send the line "unsubscribe linux-rdma&qu

Re: merge struct ib_device_attr into struct ib_device V2

2015-10-27 Thread Sagi Grimberg
Did we converge on this? Just a heads up to Doug, this conflicts with [PATCH v4 11/16] xprtrdma: Pre-allocate Work Requests for backchannel but it's trivial to sort out... -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to

Re: [PATCH 02/25] IB/mthca, net/mlx4: remove counting semaphores

2015-10-28 Thread Sagi Grimberg
Hi Arnd, Since we want to make counting semaphores go away, Why do we want to make counting semaphores go away? completely? or just for binary use cases? I have a use case in iser target code where a counting semaphore is the best suited synchronizing mechanism. I have a single thread

Re: [PATCH 0/7] Fix an infinite loop in the SRP initiator

2015-10-28 Thread Sagi Grimberg
Submitting a SCSI request through the SG_IO mechanism with a scatterlist that is longer than what is supported by the SRP initiator triggers an infinite loop. This patch series fixes that behavior. The individual patches in this series are as follows: 0001-IB-srp-Fix-a-spelling-error.patch

Re: [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags

2015-11-11 Thread Sagi Grimberg
Jason, It is always acceptable to use a lkey MR instead of the local dma lkey, but ULPs should prefer to use the local dma lkey if possible, for performance reasons. I don't necessarily agree with this statement (at least with the second part of it), the world is not always perfect. For RDMA

Re: [PATCH RFC 1/3] IB/core: Expose a device attribute for rdma_read access flags

2015-11-11 Thread Sagi Grimberg
On 10/11/2015 15:41, Christoph Hellwig wrote: FYI, this is the API I'd aim for (only SRP and no HW driver converted yet): This looks fine, although personally I find scope and direction flags more confusing than access_flags (but maybe it's just me). I think that the real issue here is the

Re: [PATCH RFC 1/3] IB/core: Expose a device attribute for rdma_read access flags

2015-11-11 Thread Sagi Grimberg
On 11/11/2015 10:08, Christoph Hellwig wrote: On Tue, Nov 10, 2015 at 11:01:56AM -0700, Jason Gunthorpe wrote: No need to change every driver. I'd suggest something like unsigned int rdma_cap_rdma_read_mr_flags(const struct ib_pd *pd) { if (rdma_protocol_iwarp(pd->device,

Re: [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags

2015-11-11 Thread Sagi Grimberg
I’d like to see our NFS server use the local DMA lkey where it makes sense, to avoid the cost of registering and invalidating memory. I have to agree with Tom that once the device’s s/g limit is exceeded, the server has to post an RDMA Read WR every few pages, and appears to get expensive

Re: srp state in current mainline

2015-11-11 Thread Sagi Grimberg
On 11/11/2015 18:18, Christoph Hellwig wrote: On Wed, Nov 11, 2015 at 08:03:46AM -0800, Bart Van Assche wrote: Hello Christoph, The SRP initiator from kernel 4.3 is working fine on my test setup. I will start a test with Linus' tree and with the following SRP kernel module parameters: # cat

Re: [PATCH v2 0/2] Handle mlx4 max_sge_rd correctly

2015-11-10 Thread Sagi Grimberg
On 28/10/2015 13:28, Sagi Grimberg wrote: This addresses a specific mlx4 issue where the max_sge_rd is actually smaller than max_sge (rdma reads with max_sge entries completes with error). The second patch removes the explicit work-around from the iser target code. Changes from v1: - Fixed

Re: [PATCH] IB/srp: Fix possible send queue overflow

2015-11-10 Thread Sagi Grimberg
Hi Doug, Kind reminder for picking this up for 4.4 Doug? Are you planning to pick this up? Note that this patch is stable material as well. Doug? any plans for this patch? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to

Re: [PATCH] IB/mad: In validate_mad, validate CM method and attribute

2015-11-15 Thread Sagi Grimberg
Hello Hal, With which SRP target has this behavior been observed ? Has this patch been tested with the LIO SRP target ? Hi Bart, This issue was detected when testing a new array with SRP support. This does not involve LIO as the Linux CM stack does not behave in the way described in this

Re: [PATCH 3/9] IB: add a helper to safely drain a QP

2015-11-15 Thread Sagi Grimberg
+ +struct ib_stop_cqe { + struct ib_cqe cqe; + struct completion done; +}; + +static void ib_stop_done(struct ib_cq *cq, struct ib_wc *wc) +{ + struct ib_stop_cqe *stop = + container_of(wc->wr_cqe, struct ib_stop_cqe, cqe); + + complete(>done); +} + +/* +

Re: [PATCH 1/9] move blk_iopoll to limit and make it generally available

2015-11-15 Thread Sagi Grimberg
On Fri, Nov 13, 2015 at 3:46 PM, Christoph Hellwig wrote: The new name is irq_poll as iopoll is already taken. Better suggestions welcome. Sagi (or Christoph if you can address that), @ some pointer over the last 18 months there was a port done at mellanox for iser to use

Re: [PATCH 2/9] IB: add a proper completion queue abstraction

2015-11-15 Thread Sagi Grimberg
On 15/11/2015 14:55, Christoph Hellwig wrote: On Sun, Nov 15, 2015 at 11:40:02AM +0200, Sagi Grimberg wrote: I doubt INT_MAX is useful as a budget in any use-case. it can easily hog the CPU. If the consumer is given access to poll a CQ, it must be able to provide some way to budget it. Why

Re: [PATCH 1/9] move blk_iopoll to limit and make it generally available

2015-11-15 Thread Sagi Grimberg
On 15/11/2015 11:04, Or Gerlitz wrote: On Sun, Nov 15, 2015 at 10:48 AM, Sagi Grimberg <sa...@dev.mellanox.co.il> wrote: Or is correct, I have attempted to convert iser to use blk_iopoll in the past, however I've seen inconsistent performance and latency skews (comparing to tasklet

Re: [PATCH] ib_srp: initialize dma_length in srp_map_idb

2015-11-15 Thread Sagi Grimberg
We should really get this properly map/unmap per IO at some point. Probably do it in both code paths... Having said that, Looks fine, Reviewed-by: Sagi Grimberg <sa...@mellanox.com> -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord..

[PATCH for-next 03/10] IB/iser: Don't register memory for all immediatedata writes

2015-11-16 Thread Sagi Grimberg
From: Jenny Derzhavetz <jen...@mellanox.com> When all the task data is sent as immeidatedata, we are allowed to use the local_dma_lkey as it is not sent to the wire. Signed-off-by: Jenny Derzhavetz <jen...@mellanox.com> Signed-off-by: Sagi Grimberg <sa...@mellanox.com> ---

[PATCH for-next 01/10] IB/iser: Fix module init not cleaning up on error flow

2015-11-16 Thread Sagi Grimberg
From: Roi Dayan <r...@mellanox.com> destroy workqueue on transport register error release kmem cache on workqueue alloc error Signed-off-by: Roi Dayan <r...@mellanox.com> Signed-off-by: Sagi Grimberg <sa...@mellanox.com> --- drivers/infiniband/ulp/iser/iscsi_iser.c | 9 ++-

[PATCH for-next 10/10] IB/iser: Support the remote invalidation exception

2015-11-16 Thread Sagi Grimberg
sponse) completion. Signed-off-by: Jenny Derzhavetz <jen...@mellanox.com> Signed-off-by: Sagi Grimberg <sa...@mellanox.com> --- drivers/infiniband/ulp/iser/iscsi_iser.h | 3 +- drivers/infiniband/ulp/iser/iser_initiator.c | 55 +++- drivers/infiniband/ulp

[PATCH for-next 06/10] iser-target: Remove unused file iser_proto.h

2015-11-16 Thread Sagi Grimberg
We don't need iser_proto.h anymore, remove it and move (non-protocol) declarations to ib_isert.h Signed-off-by: Sagi Grimberg <sa...@mellanox.com> Signed-off-by: Jenny Derzhavetz <jen...@mellanox.com> --- drivers/infiniband/ulp/isert/ib_isert.c| 1 - drivers/infiniband/ulp/iser

[PATCH for-next 07/10] iser-target: Declare correct flags when accepting a connection

2015-11-16 Thread Sagi Grimberg
From: Jenny Derzhavetz <jen...@mellanox.com> iser target does not support zero based virtual addresses and send with invalidate, so it should declare that it doesn't. Signed-off-by: Jenny Derzhavetz <jen...@mellanox.com> Signed-off-by: Sagi Grimberg <sa...@mellanox.com> ---

[PATCH for-next 09/10] IB/iser: Increment the rkey when registering and not when invalidating

2015-11-16 Thread Sagi Grimberg
With remote invalidate we won't local invalidate but we still want to increment the rkey. Signed-off-by: Sagi Grimberg <sa...@mellanox.com> Signed-off-by: Jenny Derzhavetz <jen...@mellanox.com> --- drivers/infiniband/ulp/iser/iser_memory.c | 20 ++-- 1 file changed, 1

[PATCH for-next 04/10] IB/iser: set intuitive values for mr_valid

2015-11-16 Thread Sagi Grimberg
ise. Signed-off-by: Jenny Derzhavetz <jen...@mellanox.com> Signed-off-by: Sagi Grimberg <sa...@mellanox.com> --- drivers/infiniband/ulp/iser/iser_memory.c | 8 drivers/infiniband/ulp/iser/iser_verbs.c | 4 ++-- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a

[PATCH for-next 05/10] iser: Have initiator and target to share protocol structures and definitions

2015-11-16 Thread Sagi Grimberg
The iser RDMA_CM negotiation protocol is shared by the initiator and the target, so have a shared header for the defines and structure. Move relevant items from the initiator and target headers. Signed-off-by: Sagi Grimberg <sa...@mellanox.com> Signed-off-by: Jenny Derzhavetz <jen...@mel

[PATCH for-next 08/10] iser-target: Support the remote invalidation exception

2015-11-16 Thread Sagi Grimberg
sponse. Signed-off-by: Jenny Derzhavetz <jen...@mellanox.com> Signed-off-by: Sagi Grimberg <sa...@mellanox.com> --- drivers/infiniband/ulp/isert/ib_isert.c | 39 +++-- drivers/infiniband/ulp/isert/ib_isert.h | 2 ++ 2 files changed, 34 insertions(+), 7

Re: [PATCH 4/9] IB: remove in-kernel support for memory windows

2015-11-16 Thread Sagi Grimberg
Remove the unused ib_allow_mw and ib_bind_mw functions, remove the unused IB_WR_BIND_MW and IB_WC_BIND_MW opcodes and move ib_dealloc_mw into the uverbs module. Signed-off-by: Christoph Hellwig Will the user-space drivers posting via uverbs (qib, hfi, rxe) need the post_send

Re: [PATCH 4/9] IB: remove in-kernel support for memory windows

2015-11-16 Thread Sagi Grimberg
On 16/11/2015 19:02, Christoph Hellwig wrote: On Mon, Nov 16, 2015 at 07:00:06PM +0200, Sagi Grimberg wrote: Remove the unused ib_allow_mw and ib_bind_mw functions, remove the unused IB_WR_BIND_MW and IB_WC_BIND_MW opcodes and move ib_dealloc_mw into the uverbs module. Signed-off

Re: [PATCH 3/9] IB: add a helper to safely drain a QP

2015-11-16 Thread Sagi Grimberg
After looking at the nes driver, I don't see any common way to support drain w/o some serious driver mods. Since SRP is the only user, perhaps we can ignore iWARP for this function... But iser/isert essentially does it too (and I think xprtrdma will have it soon)... the modify_qp is

Re: [PATCH] ib_srp: initialize dma_length in srp_map_idb

2015-11-16 Thread Sagi Grimberg
On 15/11/2015 23:10, Or Gerlitz wrote: On Sun, Nov 15, 2015, Sagi Grimberg <sa...@dev.mellanox.co.il> wrote: On 15/11/2015 19:59, Christoph Hellwig wrote: Without this sg_dma_len will return 0 on architectures tha have the dma_length field. and what wrong with that? Becaus

Re: [PATCH RFC 1/3] IB/core: Expose a device attribute for rdma_read access flags

2015-11-10 Thread Sagi Grimberg
On 10/11/2015 14:28, Sagi Grimberg wrote: Hi Yann, Why were those hw providers not modified to enforce IB_ACCESS_REMOTE_WRITE when needed, instead of asking users to set it for them ? Do you mean that ULPs will set IB_ACCESS_LOCAL_WRITE and iWARP providers executing the memory

Re: [PATCH RFC 1/3] IB/core: Expose a device attribute for rdma_read access flags

2015-11-10 Thread Sagi Grimberg
Sagi, the Windows NDKPI has an NDK_MR_FLAG_RDMA_READ_SINK attribute which the upper layer can use to convey this information, I've mentioned it here before. https://msdn.microsoft.com/en-us/library/windows/hardware/hh439908(v=vs.85).aspx Thanks for the tip Tom. When this approach is used,

Re: [PATCH for-next 02/10] IB/iser: Default to fastreg instead of fmr

2015-11-17 Thread Sagi Grimberg
Why? the invalidate is just one part of the story, we are doing a mapping on IO submission and CX3 has strong ordering on FRWRs, right? Yes, this is correct. We'll test on CX3 to see if this introduces a regression. We should make sure not to introduce performance regression for HW which

Re: [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags

2015-11-10 Thread Sagi Grimberg
On 10/11/2015 13:38, Christoph Hellwig wrote: On Tue, Nov 10, 2015 at 12:44:14PM +0200, Sagi Grimberg wrote: --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c @@ -238,7 +238,7 @@ int rdma_read_chunk_frmr(struct svcxprt_rdma *xprt, read = min_t

[PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags

2015-11-10 Thread Sagi Grimberg
Instead of hard-coding remote access (which is not secured issue in IB). Signed-off-by: Sagi Grimberg <sa...@mellanox.com> --- net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/

Re: [PATCH RFC 2/3] svcrdma: Use device rdma_read_access_flags

2015-11-10 Thread Sagi Grimberg
On 10/11/2015 13:41, Christoph Hellwig wrote: Oh, and while we're at it. Can someone explain why we're even using rdma_read_chunk_frmr for IB? It seems to work around the fact tat iWarp only allow a single RDMA READ SGE, but it's used whenever the device has IB_DEVICE_MEM_MGT_EXTENSIONS,

Re: [PATCH RFC 3/3] RDS_IW: Use device rdma_read_access_flags

2015-11-10 Thread Sagi Grimberg
Looks reasonable, although currently this code is only used for iWarp anyway. I know... I'm hoping this will change at some point, and when it does, it will get it right hopefully. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to

Re: [PATCH RFC 1/3] IB/core: Expose a device attribute for rdma_read access flags

2015-11-10 Thread Sagi Grimberg
From all I can tell nes also is a iWarp driver. It is.. I don't know why I treated it as IB :) -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH RFC 0/3] Introduce device attribute rdma_read_access_flags

2015-11-10 Thread Sagi Grimberg
attributes merge into struct ib_device. Sagi Grimberg (3): IB/core: Expose a device attribute for rdma_read access flags svcrdma: Use device rdma_read_access_flags RDS_IW: Use device rdma_read_access_flags drivers/infiniband/hw/cxgb3/iwch_provider.c | 2 ++ drivers/infiniband/hw/cxgb4

[PATCH RFC 1/3] IB/core: Expose a device attribute for rdma_read access flags

2015-11-10 Thread Sagi Grimberg
Signed-off-by: Sagi Grimberg <sa...@mellanox.com> --- drivers/infiniband/hw/cxgb3/iwch_provider.c | 2 ++ drivers/infiniband/hw/cxgb4/provider.c | 2 ++ drivers/infiniband/hw/mlx4/main.c| 1 + drivers/infiniband/hw/mlx5/main.c| 1 + drivers/infiniband/hw

Re: [PATCH RFC 0/3] Introduce device attribute rdma_read_access_flags

2015-11-10 Thread Sagi Grimberg
FYI, I've updated the git branch to be based on current linus' tree which required a few bit to be fixed. I'd also like to note that while everyone but Or seemed to be generally fine with it I'd really prefer and actualy revivewed-by or acked-by tag. You can add: Tested-by: Sagi Grimberg

Re: [PATCH RFC 1/3] IB/core: Expose a device attribute for rdma_read access flags

2015-11-10 Thread Sagi Grimberg
Hi Yann, Why were those hw providers not modified to enforce IB_ACCESS_REMOTE_WRITE when needed, instead of asking users to set it for them ? Do you mean that ULPs will set IB_ACCESS_LOCAL_WRITE and iWARP providers executing the memory registration will add IB_ACCESS_REMOTE_WRITE? That's

Re: [PATCH] IB: start documenting device capabilities

2015-11-10 Thread Sagi Grimberg
which must support FRs to comply +* to the iWarp verbs spec. iWarp devices also support the +* IB_WR_RDMA_READ_WITH_INV verb for RDMA READs that invalidate the +* stag. +*/ Kinda weird that READ_WITH_INV came in without a device cap for it. Looks good, Reviewe

Re: [PATCH 1/7] IB/srp: Fix a spelling error

2015-11-03 Thread Sagi Grimberg
Looks good, Reviewed-by: Sagi Grimberg <sa...@mellanox.com> -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/7] IB/srp: Rename work request ID labels

2015-11-03 Thread Sagi Grimberg
Looks good, Reviewed-by: Sagi Grimberg <sa...@mellanox.com> -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/7] IB/srp: Document srp_map_data() return value

2015-11-03 Thread Sagi Grimberg
Looks good, Reviewed-by: Sagi Grimberg <sa...@mellanox.com> -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH 7/7] IB/srp: Avoid that mapping failure triggers an infinite loop

2015-11-03 Thread Sagi Grimberg
On 03/11/2015 20:56, Bart Van Assche wrote: On 11/03/2015 09:44 AM, Sagi Grimberg wrote: Can you spare a few words on this change in the change log? Signed-off-by: Bart Van Assche <bart.vanass...@sandisk.com> Cc: Sagi Grimberg <sa...@mellanox.com> Cc: Sebastian Parschauer &l

Re: [PATCH] IB/srp: Fix possible send queue overflow

2015-11-03 Thread Sagi Grimberg
On 15/10/2015 12:26, Sagi Grimberg wrote: When using work request based memory registration (fast_reg) we must reserve SQ entries for registration and invalidation in addition to send operations. Each IO consumes 3 SQ entries (registration, send, invalidation) so we need to allocate 3x larger

<    3   4   5   6   7   8   9   10   11   12   >