Re: [LSF/MM TOPIC] Reducing the SRP initiator failover time
On 2/8/2013 12:42 AM, Vu Pham wrote: It is known that it takes about two to three minutes before the upstream SRP initiator fails over from a failed path to a working path. This is not only considered longer than acceptable but is also longer than other Linux SCSI initiators (e.g. iSCSI and FC). Progress so far with improving the fail-over SRP initiator has been slow. This is because the discussion about candidate patches occurred at two different levels: not only the patches itself were discussed but also the approach that should be followed. That last aspect is easier to discuss in a meeting than over a mailing list. Hence the proposal to discuss SRP initiator failover behavior during the LSF/MM summit. The topics that need further discussion are: * If a path fails, remove the entire SCSI host or preserve the SCSI host and only remove the SCSI devices associated with that host ? * Which software component should test the state of a path and should reconnect to an SRP target if a path is restored ? Should that be done by the user space process srp_daemon or by the SRP initiator kernel module ? * How should the SRP initiator behave after a path failure has been detected ? Should the behavior be similar to the FC initiator with its fast_io_fail_tmo and dev_loss_tmo parameters ? Dave, if this topic gets accepted, I really hope you will be able to attend the LSF/MM summit. Bart. Hello Bart, Thank you for taking the initiative. Mellanox think that this should be discussed. We'd be happy to attend. We also would like to discuss: * How and how fast does SRP detect a path failure besides RC error? * Role of srp_daemon, how often srp_daemon scan fabric for new/old targets, how-to scale srp_daemon discovery, traps. -vu Hey Bart, I agree with Vu that this issue should be discussed. We'd be happy to attend. -- Sagi -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IB/srp: Fail I/O requests if the transport is offline
On 2/18/2013 6:06 AM, David Dillow wrote: On Fri, 2013-02-15 at 10:39 +0100, Bart Van Assche wrote: diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 8a7eb9f..b34752d 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -734,6 +734,7 @@ static int srp_reconnect_target(struct srp_target_port *target) scsi_target_unblock(shost-shost_gendev, ret == 0 ? SDEV_RUNNING : SDEV_TRANSPORT_OFFLINE); + target-transport_offline = ret != 0; Minor nit, that line is hard to read; I keep thinking it needs parens around the conditional... Perhaps target-transport_offline = !!ret; or target-transport_offline = ret; gcc should do the right conversion since we're assigning to a bool. Or, Vu, does this solve the issue you've seen? I may have time to test later this week, but not before. Hey David, This indeed solve scsi_host removal issues. Vu is on vacation, I'll perform some more failover tests... -Sagi -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IB/srp: Fail I/O requests if the transport is offline
On 2/24/2013 10:09 AM, Bart Van Assche wrote: On 02/18/13 09:11, Sagi Grimberg wrote: On 2/18/2013 6:06 AM, David Dillow wrote: On Fri, 2013-02-15 at 10:39 +0100, Bart Van Assche wrote: diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 8a7eb9f..b34752d 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -734,6 +734,7 @@ static int srp_reconnect_target(struct srp_target_port *target) scsi_target_unblock(shost-shost_gendev, ret == 0 ? SDEV_RUNNING : SDEV_TRANSPORT_OFFLINE); +target-transport_offline = ret != 0; Minor nit, that line is hard to read; I keep thinking it needs parens around the conditional... Perhaps target-transport_offline = !!ret; or target-transport_offline = ret; gcc should do the right conversion since we're assigning to a bool. Or, Vu, does this solve the issue you've seen? I may have time to test later this week, but not before. This indeed solve scsi_host removal issues. Vu is on vacation, I'll perform some more failover tests... Hello Sagi, Since no further feedback was posted on the list I assume that means that all tests passed ? Bart. Hey Bart, Sorry for the delay I was just about to reply... From my end, the related patchset seems solve the scsi_host removal issue and prevents the SCSI error handling loop. Generally our tests passed, I still have some issue with long-term failover test but I'm not sure its SRP (perhaps might origin in IB layer). So ack from me... -Sagi -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 11/13] IB/srp: Make HCA completion vector configurable
On 7/15/2013 2:06 PM, Bart Van Assche wrote: On 14/07/2013 3:43, Sagi Grimberg wrote: On 7/3/2013 3:58 PM, Bart Van Assche wrote: Several InfiniBand HCA's allow to configure the completion vector per queue pair. This allows to spread the workload created by IB completion interrupts over multiple MSI-X vectors and hence over multiple CPU cores. In other words, configuring the completion vector properly not only allows to reduce latency on an initiator connected to multiple SRP targets but also allows to improve throughput. Hey Bart, Just wrote a small patch to allow srp_daemon spread connection across HCA's completion vectors. But re-thinking on this, is it really a good idea to give the user control over completion vectors for CQs he doesn't really owns. This way the user must retrieve the maximum completion vectors from the ib_device and consider this when adding a connection and In addition will need to set proper IRQ affinity. Perhaps the driver can manage this on it's own without involving the user, take the mlx4_en driver for example, it spreads it's CQs across HCAs completion vectors without involving the user. the user that opens a socket has no influence of the underlying cq-comp-vector assignment. The only use-case I can think of is where the user will want to use only a subset of the completion-vectors if the user will want to reserve some completion-vectors for native IB applications but I don't know how common it is. Other from that, I think it is always better to spread the CQs across HCA completion-vectors, so perhaps the driver just assign connection CQs across comp-vecs without getting args from the user, but simply iterate over comp_vectors. What do you think? Hello Sagi, Sorry but I do not think it is a good idea to let srp_daemon assign the completion vector. While this might work well on single-socket systems this will result in suboptimal results on NUMA systems. For certain workloads on NUMA systems, and when a NUMA initiator system is connected to multiple target systems, the optimal configuration is to make sure that all processing that is associated with a single SCSI host occurs on the same NUMA node. This means configuring the completion vector value such that IB interrupts are generated on the same NUMA node where the associated SCSI host and applications are running. More in general, performance tuning on NUMA systems requires system-wide knowledge of all applications that are running and also of which interrupt is processed by which NUMA node. So choosing a proper value for the completion vector is only possible once the system topology and the IRQ affinity masks are known. I don't think we should build knowledge of all this in srp_daemon. Bart. Hey Bart, Thanks for your quick attention for my question. srp_daemon is a package designated for the costumer to automatically detect targets in the IB fabric. From our expeirience here in Mellanox, costumers/users like automatic plugplay tools. They are reluctant to build their own scriptology to enhance performance and settle with srp_daemon which is preferred over use of ibsrpdm and manual adding new targets. Regardless, the completion vectors assignment is meaningless without setting proper IRQ affinity, so in the worst case where the user didn't set his IRQ affinity, this assignment will perform like the default completion vector assignment as all IRQs are directed without any masking i.e. core 0. From my expiriments in NUMA systems, optimal performance is gained where all IRQs are directed to half of the cores on the NUMA node close to the HCA, and all traffic generators share the other half of the cores on the same NUMA node. So based on that knowledge, I thought that srp_daemon/srp driver will assign it's CQs across the HCAs completion vectors, and the user is encouraged to set the IRQ affinity as described above to gain optimal performance. Adding connections over the far NUMA node don't seem to benefit performance too much... As I mentioned, a use-case I see that may raise a problem here, is if the user would like to maintain multiple SRP connections and reserve some completion vectors for other IB applications on the system. in this case the user will be able to disable srp_daemon/srp driver completion vectors assignment. So, this was just an idea, and easy implementation that would potentionaly give the user semi-automatic performance optimized configuration... -Sagi -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 11/13] IB/srp: Make HCA completion vector configurable
On 7/15/2013 9:23 PM, Bart Van Assche wrote: On 15/07/2013 7:29, Sagi Grimberg wrote: srp_daemon is a package designated for the customer to automatically detect targets in the IB fabric. From our experience here in Mellanox, customers/users like automatic plugplay tools. They are reluctant to build their own scriptology to enhance performance and settle with srp_daemon which is preferred over use of ibsrpdm and manual adding new targets. Regardless, the completion vectors assignment is meaningless without setting proper IRQ affinity, so in the worst case where the user didn't set his IRQ affinity, this assignment will perform like the default completion vector assignment as all IRQs are directed without any masking i.e. core 0. From my experiments in NUMA systems, optimal performance is gained where all IRQs are directed to half of the cores on the NUMA node close to the HCA, and all traffic generators share the other half of the cores on the same NUMA node. So based on that knowledge, I thought that srp_daemon/srp driver will assign it's CQs across the HCAs completion vectors, and the user is encouraged to set the IRQ affinity as described above to gain optimal performance. Adding connections over the far NUMA node don't seem to benefit performance too much... As I mentioned, a use-case I see that may raise a problem here, is if the user would like to maintain multiple SRP connections and reserve some completion vectors for other IB applications on the system. in this case the user will be able to disable srp_daemon/srp driver completion vectors assignment. So, this was just an idea, and easy implementation that would potentially give the user semi-automatic performance optimized configuration... Hello Sagi, I agree with you that it would help a lot if completion vector assignment could be automated such that end users do not have to care about assigning completion vector numbers. The challenge is to find an approach that is general enough such that it works for all possible use cases. One possible approach is to let a tool that has knowledge about the application fill in completion vector numbers in srp_daemon.conf and let srp_daemon use the values generated by this tool. That approach would avoid that srp_daemon has to have any knowledge about the application but would still allow srp_daemon to assign the completion vector numbers. Bart. Hey Bart, This sounds like a nice Idea, but there an inherent problem about applications coming and going while the connections are static (somewhat), how can you control pinning an arbitrary application running (over SRP devices of-course) at certain point of time. So will you agree at least to give target-comp_vector a default of IB_CQ_VECTOR_LEAST_ATTACHED? From my point of view, a user that don't have a slightest clue about completion vectors and performance optimization, this is somewhat better than doing nothing... -Sagi -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 11/13] IB/srp: Make HCA completion vector configurable
On 7/16/2013 1:58 PM, Bart Van Assche wrote: On 16/07/2013 4:11, Sagi Grimberg wrote: This sounds like a nice Idea, but there an inherent problem about applications coming and going while the connections are static (somewhat), how can you control pinning an arbitrary application running (over SRP devices of-course) at certain point of time. So will you agree at least to give target-comp_vector a default of IB_CQ_VECTOR_LEAST_ATTACHED? From my point of view, a user that don't have a slightest clue about completion vectors and performance optimization, this is somewhat better than doing nothing... Hello Sagi, That sounds like an interesting proposal to me. But did the patch that adds the IB_CQ_VECTOR_LEAST_ATTACHED feature ever get accepted in the upstream Linux kernel ? I have tried to find that symbol in Linux kernel v3.11-rc1 but couldn't find it. Maybe I have overlooked something ? Bart. Oh you're right! I'll ask Vu, from git blame on old OFED I see that He wrote the code... Perhaps this should be added as well. -Sagi -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 11/13] IB/srp: Make HCA completion vector configurable
On 7/16/2013 6:11 PM, Bart Van Assche wrote: On 14/07/2013 3:43, Sagi Grimberg wrote: Just wrote a small patch to allow srp_daemon spread connection across HCA's completion vectors. Hello Sagi, How about the following approach: - Add support for reading the completion vector from srp_daemon.conf, similar to how several other parameters are already read from that file. Here We need to take into consideration that we are changing the functionality of srp_daemon.conf. Now instead of simply allowing/dis-allowing targets of specific attributes, we are also defining configuration attributes of allowed targets. This might be uncomfortable for the user to explicitly write N target strings in srp_daemon.conf just for completion vectors assignment. Perhaps srp_daemon.conf can contain a list (comma separated) of reserved completion vectors for srp_daemon to spread CQs among them. If this line won't exist - srp_daemon will spread assignment on all HCAs completion vectors. - If the completion vector parameter has not been set in srp_daemon.conf, let srp_daemon assign a completion vector such that IB interrupts for different SRP hosts use different completion vectors. Bart. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for-3.11 7/7] IB/iser: Introduce fast memory registration model (FRWR)
On 7/22/2013 2:46 PM, Bart Van Assche wrote: On 07/18/13 15:25, Or Gerlitz wrote: +static int iser_fast_reg_mr(struct fast_reg_descriptor *desc, +struct iser_conn *ib_conn, +struct iser_regd_buf *regd_buf, +u32 offset, unsigned int data_size, +unsigned int page_list_len) +{ +struct ib_send_wr fastreg_wr, inv_wr; +struct ib_send_wr *bad_wr, *wr = NULL; +u8 key; +int ret; + +if (!desc-valid) { +memset(inv_wr, 0, sizeof(inv_wr)); +inv_wr.opcode = IB_WR_LOCAL_INV; +inv_wr.send_flags = IB_SEND_SIGNALED; +inv_wr.ex.invalidate_rkey = desc-data_mr-rkey; +wr = inv_wr; +/* Bump the key */ +key = (u8)(desc-data_mr-rkey 0x00FF); +ib_update_fast_reg_key(desc-data_mr, ++key); +} + +/* Prepare FASTREG WR */ +memset(fastreg_wr, 0, sizeof(fastreg_wr)); +fastreg_wr.opcode = IB_WR_FAST_REG_MR; +fastreg_wr.send_flags = IB_SEND_SIGNALED; +fastreg_wr.wr.fast_reg.iova_start = desc-data_frpl-page_list[0] + offset; +fastreg_wr.wr.fast_reg.page_list = desc-data_frpl; +fastreg_wr.wr.fast_reg.page_list_len = page_list_len; +fastreg_wr.wr.fast_reg.page_shift = SHIFT_4K; +fastreg_wr.wr.fast_reg.length = data_size; +fastreg_wr.wr.fast_reg.rkey = desc-data_mr-rkey; +fastreg_wr.wr.fast_reg.access_flags = (IB_ACCESS_LOCAL_WRITE | + IB_ACCESS_REMOTE_WRITE | + IB_ACCESS_REMOTE_READ); Hello Sagi, If I interpret the above code correctly the rkey used in the previous FRWR is invalidated as soon as a new FRWR is queued. Does this mean that the iSER initiator limits queue depth to one ? Another question: is it on purpose that iscsi_iser_cleanup_task() does not invalidate an rkey if a command has been aborted successfully ? A conforming iSER target does not send a response for aborted commands. Will successful command abortion result in the rkey not being invalidated ? What will happen if a new FRWR is submitted with an rkey that is still valid ? Thanks, Bart. Hey Bart, You interpret correctly, iSER will local invalidate the rkey just before re-using it (conditioned that it is not previously invalidated - remotely by the target). This code is still missing the remote invalidate part, then iSER initiator will publish its remote invalidate support and in case the target may remote invalidate (seen in the RSP completion) the rkey and the Initiator will pick it up in the RSP completion and mark the associated MR as valid (valid for use again). Not sure what you meant in your question, but this does not mean that iSER initiator limits the queue depth to 1, initiator manages a pool of fastreg descriptors of size == max queued commands (per connection) each containing an ib_mr, For each concurrent IOP it takes a fastreg descriptor from the pool, and uses it for registration (if marked as not valid - will local invalidate the rkey and then use it for registration). When cleanup_task - iser_task_rdma_finalize - iser_unreg_rdma_mem is called, it just returns the fastreg to the pool (without local invalidate - as it is done when it will be reused). The reason I chose to do that is that if I locally invalidate the rkey upon task cleanup then Only after the completion I'm allowed to return it back to the pool (only then I know it's ready for reuse) and assuming that I still want to evacuate the task and not wait in my fast-path, I may end-up in certain conditions in a situation that I have no resources to handle the next IOP since all MRs are waiting for LOCAL_INV completions. A possible solution here was to heuristically use a larger pool - but I wanted to avoid that... So just to clarify the flow: . at connection establishment allocate pool of fastreg descriptors . upon each IOP take a fastreg descriptor from the pool . if it is not invalidated - invalidate it. . register using FRWR. . when cleanup_task is called - just return the fastreg descriptor to the pool. . at connection teardown free all resources. Still to come: . upon each IOP response, check if the target used remote invalidate - if so mark relevant fastreg as valid. Hope this helps. -Sagi -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for-3.11 7/7] IB/iser: Introduce fast memory registration model (FRWR)
On 7/23/2013 2:58 PM, Bart Van Assche wrote: On 07/22/13 15:11, Sagi Grimberg wrote: So just to clarify the flow: . at connection establishment allocate pool of fastreg descriptors . upon each IOP take a fastreg descriptor from the pool . if it is not invalidated - invalidate it. . register using FRWR. . when cleanup_task is called - just return the fastreg descriptor to the pool. . at connection teardown free all resources. Still to come: . upon each IOP response, check if the target used remote invalidate - if so mark relevant fastreg as valid. Hello Sagi and Or, Thanks for the clarifications. I have one more question though. My interpretation of section 10.6 Memory Management in the IB specification is that memory registration maps a memory region that either has contiguous virtual addresses or contiguous physical addresses. However, there is no such requirement for an sg-list. As an example, for direct I/O to a block device with a sector size of 512 bytes it is only required that I/O occurs in multiples of 512 bytes and from memory aligned on 512-byte boundaries. So the use of direct I/O can result in an sg-list where the second and subsequent sg-list elements have a non-zero offset. Do you agree with this ? Are such sg-lists mapped correctly by the FRWR code ? Bart. Hey Bart, You are on the money with this observation, like FMRs, FRWR cannot register any arbitrary SG-list. You have the same limitations. Unlike SRP where the initiator will use multiple FMRs to register such unaligned SG-lists, iSER uses a bounce buffer to copy the data to a nice physically contiguous memory area (see patch 5/7 fall_to_bounce_buf routine), thus will pass a single R_Key for each transaction. An equivalent FRWR implementation for SRP will also use multiple FRWRs in-order to register such un-aligned SG-lists and publish the R_Keys in ib_sge. Hope this helps, -Sagi -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for-3.11 7/7] IB/iser: Introduce fast memory registration model (FRWR)
On 7/28/2013 11:15 AM, Or Gerlitz wrote: On 26/07/2013 20:15, Vu Pham wrote: Hello Or/Sagi, Just a minor /** + * iser_create_frwr_pool - Creates pool of fast_reg descriptors + * for fast registration work requests. + * returns 0 on success, or errno code on failure + */ +int iser_create_frwr_pool(struct iser_conn *ib_conn, unsigned cmds_max) +{ +struct iser_device*device = ib_conn-device; +struct fast_reg_descriptor*desc; +int i, ret; + + INIT_LIST_HEAD(ib_conn-fastreg.frwr.pool); +ib_conn-fastreg.frwr.pool_size = 0; +for (i = 0; i cmds_max; i++) { +desc = kmalloc(sizeof(*desc), GFP_KERNEL); +if (!desc) { +iser_err(Failed to allocate a new fast_reg descriptor\n); +ret = -ENOMEM; +goto err; +} + +desc-data_frpl = ib_alloc_fast_reg_page_list(device-ib_device, + ISCSI_ISER_SG_TABLESIZE + 1); +if (IS_ERR(desc-data_frpl)) { ret = PTR_ERR(desc-data_frpl); +iser_err(Failed to allocate ib_fast_reg_page_list err=%ld\n, + PTR_ERR(desc-data_frpl)); using ret +goto err; +} + +desc-data_mr = ib_alloc_fast_reg_mr(device-pd, + ISCSI_ISER_SG_TABLESIZE + 1); +if (IS_ERR(desc-data_mr)) { ret = PTR_ERR(desc-data_mr); +iser_err(Failed to allocate ib_fast_reg_mr err=%ld\n, + PTR_ERR(desc-data_mr)); using ret + ib_free_fast_reg_page_list(desc-data_frpl); +goto err; +} +desc-valid = true; +list_add_tail(desc-list, ib_conn-fastreg.frwr.pool); +ib_conn-fastreg.frwr.pool_size++; +} + +return 0; +err: +iser_free_frwr_pool(ib_conn); +return ret; +} Nice catch! I see that Roland hasn't yet picked this series so I will re-submit it with fixes to the issues you have found here. Or. Nice catch indeed, thanks Vu. -Sagi -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IB/iser: Generalize rdma memory registration
On 8/14/2013 10:52 PM, Dan Carpenter wrote: Hello Sagi Grimberg, This is a semi-automatic email about new static checker warnings. The patch b4e155ffbbd6: IB/iser: Generalize rdma memory registration from Jul 28, 2013, leads to the following Smatch complaint: drivers/infiniband/ulp/iser/iser_initiator.c:318 iser_free_rx_descriptors() error: we previously assumed 'device' could be null (see line 313) drivers/infiniband/ulp/iser/iser_initiator.c 312 313 if (device device-iser_free_rdma_reg_res) ^^ New check. 314 device-iser_free_rdma_reg_res(ib_conn); 315 316 rx_desc = ib_conn-rx_descs; 317 for (i = 0; i ib_conn-qp_max_recv_dtos; i++, rx_desc++) 318 ib_dma_unmap_single(device-ib_device, rx_desc-dma_addr, ^ Old dererference. 319 ISER_RX_PAYLOAD_SIZE, DMA_FROM_DEVICE); 320 kfree(ib_conn-rx_descs); Has the code changed so that we need to check now? regards, dan carpenter Hey Dan, Thanks for the input! The case here is that for some weird error flows we can end-up in this function with device == NULL, but if you pass the first condition if (!ib_conn-rx_descs) you are safe... I'll fire up a fix for that asap. Cheers, -Sagi -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] IB/iser: Fix redundant pointer check in dealloc flow
This bug was discovered by Smatch static checker ran by Dan Carpenter. If in free_rx_descriptors rx_descs are not NULL the iser device is definately not NULL as it was created before, so no need to check it before dereferencing it. Signed-off-by: Sagi Grimberg sa...@mellanox.com Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- drivers/infiniband/ulp/iser/iser_initiator.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c b/drivers/infiniband/ulp/iser/iser_initiator.c index bdc38f4..5f01da9 100644 --- a/drivers/infiniband/ulp/iser/iser_initiator.c +++ b/drivers/infiniband/ulp/iser/iser_initiator.c @@ -310,7 +310,7 @@ void iser_free_rx_descriptors(struct iser_conn *ib_conn) if (!ib_conn-rx_descs) goto free_login_buf; - if (device device-iser_free_rdma_reg_res) + if (device-iser_free_rdma_reg_res) device-iser_free_rdma_reg_res(ib_conn); rx_desc = ib_conn-rx_descs; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 8/8] IB/srp: Make queue size configurable
On 8/20/2013 3:50 PM, Bart Van Assche wrote: Certain storage configurations, e.g. a sufficiently large array of hard disks in a RAID configuration, need a queue depth above 64 to achieve optimal performance. Hence make the queue depth configurable. Signed-off-by: Bart Van Assche bvanass...@acm.org Cc: Roland Dreier rol...@purestorage.com Cc: David Dillow dillo...@ornl.gov Cc: Vu Pham v...@mellanox.com Cc: Sebastian Riemer sebastian.rie...@profitbricks.com Cc: Konrad Grzybowski konr...@k2.pl --- drivers/infiniband/ulp/srp/ib_srp.c | 125 ++- drivers/infiniband/ulp/srp/ib_srp.h | 17 +++-- 2 files changed, 103 insertions(+), 39 deletions(-) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index ece1f2d..6de2323 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -299,16 +299,16 @@ static int srp_create_target_ib(struct srp_target_port *target) return -ENOMEM; recv_cq = ib_create_cq(target-srp_host-srp_dev-dev, - srp_recv_completion, NULL, target, SRP_RQ_SIZE, - target-comp_vector); + srp_recv_completion, NULL, target, + target-queue_size, target-comp_vector); if (IS_ERR(recv_cq)) { ret = PTR_ERR(recv_cq); goto err; } send_cq = ib_create_cq(target-srp_host-srp_dev-dev, - srp_send_completion, NULL, target, SRP_SQ_SIZE, - target-comp_vector); + srp_send_completion, NULL, target, + target-queue_size, target-comp_vector); if (IS_ERR(send_cq)) { ret = PTR_ERR(send_cq); goto err_recv_cq; @@ -317,8 +317,8 @@ static int srp_create_target_ib(struct srp_target_port *target) ib_req_notify_cq(recv_cq, IB_CQ_NEXT_COMP); init_attr-event_handler = srp_qp_event; - init_attr-cap.max_send_wr = SRP_SQ_SIZE; - init_attr-cap.max_recv_wr = SRP_RQ_SIZE; + init_attr-cap.max_send_wr = target-queue_size; + init_attr-cap.max_recv_wr = target-queue_size; init_attr-cap.max_recv_sge= 1; init_attr-cap.max_send_sge= 1; init_attr-sq_sig_type = IB_SIGNAL_ALL_WR; @@ -364,6 +364,10 @@ err: return ret; } +/* + * Note: this function may be called without srp_alloc_iu_bufs() having been + * invoked. Hence the target-[rt]x_ring checks. + */ static void srp_free_target_ib(struct srp_target_port *target) { int i; @@ -375,10 +379,18 @@ static void srp_free_target_ib(struct srp_target_port *target) target-qp = NULL; target-send_cq = target-recv_cq = NULL; - for (i = 0; i SRP_RQ_SIZE; ++i) - srp_free_iu(target-srp_host, target-rx_ring[i]); - for (i = 0; i SRP_SQ_SIZE; ++i) - srp_free_iu(target-srp_host, target-tx_ring[i]); + if (target-rx_ring) { + for (i = 0; i target-queue_size; ++i) + srp_free_iu(target-srp_host, target-rx_ring[i]); + kfree(target-rx_ring); + target-rx_ring = NULL; + } + if (target-tx_ring) { + for (i = 0; i target-queue_size; ++i) + srp_free_iu(target-srp_host, target-tx_ring[i]); + kfree(target-tx_ring); + target-tx_ring = NULL; + } } static void srp_path_rec_completion(int status, @@ -564,7 +576,11 @@ static void srp_free_req_data(struct srp_target_port *target) struct srp_request *req; int i; - for (i = 0, req = target-req_ring; i SRP_CMD_SQ_SIZE; ++i, ++req) { + if (!target-req_ring) + return; + + for (i = 0; i target-req_ring_size; ++i) { + req = target-req_ring[i]; kfree(req-fmr_list); kfree(req-map_page); if (req-indirect_dma_addr) { @@ -574,6 +590,9 @@ static void srp_free_req_data(struct srp_target_port *target) } kfree(req-indirect_desc); } + + kfree(target-req_ring); + target-req_ring = NULL; } static int srp_alloc_req_data(struct srp_target_port *target) @@ -586,7 +605,12 @@ static int srp_alloc_req_data(struct srp_target_port *target) INIT_LIST_HEAD(target-free_reqs); - for (i = 0; i SRP_CMD_SQ_SIZE; ++i) { + target-req_ring = kzalloc(target-req_ring_size * + sizeof(*target-req_ring), GFP_KERNEL); + if (!target-req_ring) + goto out; + + for (i = 0; i target-req_ring_size; ++i) { req = target-req_ring[i]; req-fmr_list = kmalloc(target-cmd_sg_cnt * sizeof(void *), GFP_KERNEL); @@ -810,7 +834,7 @@
Re: [PATCH 8/8] IB/srp: Make queue size configurable
On 8/20/2013 8:43 PM, David Dillow wrote: On Tue, 2013-08-20 at 17:55 +0200, Bart Van Assche wrote: On 08/20/13 17:34, Sagi Grimberg wrote: Question, If srp now will allow larger queues while using a single global FMR pool of size 1024, isn't it more likely now that in stress environment srp will run out of FMRs to handle IO commands? I mean that let's say that you have x scsi hosts with can_queue size of 512 (+-) and all of them are running IO stress, is it possible that all FMRs will be inuse and no FMR is available to register the next IO SG-list? Did you try out such a scenario? I guess that in such a case IB core will return EAGAIN and SRP will return SCSI_MLQUEUE_HOST_BUSY. I think it is a good Idea to move FMR pools to be per connection rather than a global pool, what do you think? That makes sense to me. And as long as the above has not yet been implemented I'm fine with dropping patch 8/8 from this patch set. Don't drop it; most configs won't have all that many connections and shouldn't have an issue; even those that do will only see a potential slowdown when running with everything at once. We can address the FMR/BMME issues on top of this patch. Agree. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC 2/9] IB/core: Introduce Signature Verbs API
This commit Introduces the Verbs Interface for signature related operations. A signature handover operation shall configure the layouts of data and protection attributes both in memory and wire domains. Once the signature handover opration is done, the HCA will offload data integrity generation/validation while performing the actual data transfer. Additions: 1. HCA signature capabilities in device attributes Verbs provider supporting Signature handover operations shall fill relevant fields in device attributes structure returned by ib_query_device. 2. QP creation flag IB_QP_CREATE_SIGNATURE_EN Creating QP that will carry signature handover operations may require some special preperations from the verbs provider. So we add QP creation flag IB_QP_CREATE_SIGNATURE_EN to declare that the created QP may carry out signature handover operations. Expose signature support to verbs layer (no support for now) 3. New send work request IB_WR_REG_SIG_MR Signature handover work request. This WR will define the signature handover properties of the memory/wire domains as well as the domains layout. * Currently expose just T10-DIF layout. 4. New Verb ib_check_sig_status check_sig_status Verb shall check if any signature errors are pending for a specific signature related ib_mr. User should provide the ib_qp that executed the RDMA operation involving the given ib_mr. Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/core/verbs.c |8 ++ include/rdma/ib_verbs.h | 140 ++- 2 files changed, 147 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 1d94a5c..5636d65 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -1293,3 +1293,11 @@ int ib_dealloc_xrcd(struct ib_xrcd *xrcd) return xrcd-device-dealloc_xrcd(xrcd); } EXPORT_SYMBOL(ib_dealloc_xrcd); + +int ib_check_sig_status(struct ib_mr *sig_mr, + struct ib_sig_err *sig_err) +{ + return sig_mr-device-check_sig_status ? + sig_mr-device-check_sig_status(sig_mr, sig_err) : -ENOSYS; +} +EXPORT_SYMBOL(ib_check_sig_status); diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 65b7e79..cf46a83 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -116,7 +116,19 @@ enum ib_device_cap_flags { IB_DEVICE_MEM_MGT_EXTENSIONS= (121), IB_DEVICE_BLOCK_MULTICAST_LOOPBACK = (122), IB_DEVICE_MEM_WINDOW_TYPE_2A= (123), - IB_DEVICE_MEM_WINDOW_TYPE_2B= (124) + IB_DEVICE_MEM_WINDOW_TYPE_2B= (124), + IB_DEVICE_SIGNATURE_HANDOVER= (125), +}; + +enum ib_signature_prot_cap { + IB_PROT_T10DIF_TYPE_1 = 1, + IB_PROT_T10DIF_TYPE_2 = 1 1, + IB_PROT_T10DIF_TYPE_3 = 1 2, +}; + +enum ib_signature_guard_cap { + IB_GUARD_T10DIF_CRC = 1, + IB_GUARD_T10DIF_CSUM= 1 1, }; enum ib_atomic_cap { @@ -166,6 +178,8 @@ struct ib_device_attr { unsigned intmax_fast_reg_page_list_len; u16 max_pkeys; u8 local_ca_ack_delay; + enum ib_signature_prot_cap sig_prot_cap; + enum ib_signature_guard_cap sig_guard_cap; }; enum ib_mtu { @@ -630,6 +644,7 @@ enum ib_qp_type { enum ib_qp_create_flags { IB_QP_CREATE_IPOIB_UD_LSO = 1 0, IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK = 1 1, + IB_QP_CREATE_SIGNATURE_EN = 1 2, /* reserve bits 26-31 for low level drivers' internal use */ IB_QP_CREATE_RESERVED_START = 1 26, IB_QP_CREATE_RESERVED_END = 1 31, @@ -780,6 +795,7 @@ enum ib_wr_opcode { IB_WR_MASKED_ATOMIC_CMP_AND_SWP, IB_WR_MASKED_ATOMIC_FETCH_AND_ADD, IB_WR_BIND_MW, + IB_WR_REG_SIG_MR, /* reserve values for low level drivers' internal use. * These values will not be used at all in the ib core layer. */ @@ -885,6 +901,19 @@ struct ib_send_wr { u32 rkey; struct ib_mw_bind_info bind_info; } bind_mw; + struct { + struct ib_sig_attrs*sig_attrs; + struct ib_mr *sig_mr; + int access_flags; + /* Registered data mr */ + struct ib_mr *data_mr; + u32 data_size; + u64 data_va; + /* Registered protection mr */ + struct ib_mr *prot_mr; + u32 prot_size; + u64 prot_va; + } sig_handover; } wr
[PATCH RFC 4/9] IB/mlx5: Initialize mlx5_ib_qp signature related
If user requested signature enable we Initialize relevant mlx5_ib_qp members. we mark the qp as sig_enable we initiatlize empty sig_err_list, and we increase qp size. Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/hw/mlx5/mlx5_ib.h |5 + drivers/infiniband/hw/mlx5/qp.c |7 +++ include/linux/mlx5/qp.h |1 + 3 files changed, 13 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index 45d7424..1d5793e 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -189,6 +189,11 @@ struct mlx5_ib_qp { int create_type; u32 pa_lkey; + + /* Store signature errors */ + boolsignature_en; + struct list_headsig_err_list; + spinlock_t sig_err_lock; }; struct mlx5_ib_cq_buf { diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index 045f8cd..9a8c622 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -734,6 +734,13 @@ static int create_qp_common(struct mlx5_ib_dev *dev, struct ib_pd *pd, spin_lock_init(qp-sq.lock); spin_lock_init(qp-rq.lock); + if (init_attr-create_flags == IB_QP_CREATE_SIGNATURE_EN) { + init_attr-cap.max_send_wr *= MLX5_SIGNATURE_SQ_MULT; + spin_lock_init(qp-sig_err_lock); + INIT_LIST_HEAD(qp-sig_err_list); + qp-signature_en = true; + } + if (init_attr-sq_sig_type == IB_SIGNAL_ALL_WR) qp-sq_signal_bits = MLX5_WQE_CTRL_CQ_UPDATE; diff --git a/include/linux/mlx5/qp.h b/include/linux/mlx5/qp.h index d9e3eac..174805c 100644 --- a/include/linux/mlx5/qp.h +++ b/include/linux/mlx5/qp.h @@ -37,6 +37,7 @@ #include linux/mlx5/driver.h #define MLX5_INVALID_LKEY 0x100 +#define MLX5_SIGNATURE_SQ_MULT 3 enum mlx5_qp_optpar { MLX5_QP_OPTPAR_ALT_ADDR_PATH= 1 0, -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC 3/9] IB/mlx5, mlx5_core: Support for create_mr and destroy_mr
Support create_mr and destroy_mr verbs. Creating ib_mr may be done for either ib_mr that will register regular page lists like alloc_fast_reg_mr routine, or indirect ib_mr's that can register other (pre-registered) ib_mr's in an indirect manner. In addition user may request signature enable, that will mean that the created ib_mr may be attached with signature attributes (BSF, PSVs). Currently we only allow direct/indirect registration modes. Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/hw/mlx5/main.c|2 + drivers/infiniband/hw/mlx5/mlx5_ib.h |4 + drivers/infiniband/hw/mlx5/mr.c | 120 ++ drivers/net/ethernet/mellanox/mlx5/core/mr.c | 64 ++ include/linux/mlx5/device.h | 25 ++ include/linux/mlx5/driver.h | 22 + 6 files changed, 237 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index 3f831de..2e67a37 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -1401,9 +1401,11 @@ static int init_one(struct pci_dev *pdev, dev-ib_dev.get_dma_mr = mlx5_ib_get_dma_mr; dev-ib_dev.reg_user_mr = mlx5_ib_reg_user_mr; dev-ib_dev.dereg_mr= mlx5_ib_dereg_mr; + dev-ib_dev.destroy_mr = mlx5_ib_destroy_mr; dev-ib_dev.attach_mcast= mlx5_ib_mcg_attach; dev-ib_dev.detach_mcast= mlx5_ib_mcg_detach; dev-ib_dev.process_mad = mlx5_ib_process_mad; + dev-ib_dev.create_mr = mlx5_ib_create_mr; dev-ib_dev.alloc_fast_reg_mr = mlx5_ib_alloc_fast_reg_mr; dev-ib_dev.alloc_fast_reg_page_list = mlx5_ib_alloc_fast_reg_page_list; dev-ib_dev.free_fast_reg_page_list = mlx5_ib_free_fast_reg_page_list; diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index 836be91..45d7424 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -262,6 +262,7 @@ struct mlx5_ib_mr { int npages; struct completion done; enum ib_wc_status status; + struct mlx5_core_sig_ctx*sig; }; struct mlx5_ib_fast_reg_page_list { @@ -489,6 +490,9 @@ struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, u64 virt_addr, int access_flags, struct ib_udata *udata); int mlx5_ib_dereg_mr(struct ib_mr *ibmr); +int mlx5_ib_destroy_mr(struct ib_mr *ibmr); +struct ib_mr *mlx5_ib_create_mr(struct ib_pd *pd, + struct ib_mr_init_attr *mr_init_attr); struct ib_mr *mlx5_ib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len); struct ib_fast_reg_page_list *mlx5_ib_alloc_fast_reg_page_list(struct ib_device *ibdev, diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c index bd41df9..2f6758c 100644 --- a/drivers/infiniband/hw/mlx5/mr.c +++ b/drivers/infiniband/hw/mlx5/mr.c @@ -921,6 +921,126 @@ int mlx5_ib_dereg_mr(struct ib_mr *ibmr) return 0; } +struct ib_mr *mlx5_ib_create_mr(struct ib_pd *pd, + struct ib_mr_init_attr *mr_init_attr) +{ + struct mlx5_ib_dev *dev = to_mdev(pd-device); + struct mlx5_create_mkey_mbox_in *in; + struct mlx5_ib_mr *mr; + int access_mode, err; + int ndescs = roundup(mr_init_attr-max_reg_descriptors, 4); + + mr = kzalloc(sizeof(*mr), GFP_KERNEL); + if (!mr) + return ERR_PTR(-ENOMEM); + + in = kzalloc(sizeof(*in), GFP_KERNEL); + if (!in) { + err = -ENOMEM; + goto err_free; + } + + in-seg.status = 1 6; /* free */ + in-seg.xlt_oct_size = cpu_to_be32(ndescs); + in-seg.qpn_mkey7_0 = cpu_to_be32(0xff 8); + in-seg.flags_pd = cpu_to_be32(to_mpd(pd)-pdn); + + switch (mr_init_attr-reg_type) { + case IB_MR_REG_DIRECT: + access_mode = MLX5_ACCESS_MODE_MTT; + break; + case IB_MR_REG_INDIRECT: + access_mode = MLX5_ACCESS_MODE_KLM; + break; + default: + err = -EINVAL; + goto err_free; + } + in-seg.flags = MLX5_PERM_UMR_EN | access_mode; + + if (mr_init_attr-flags IB_MR_SIGNATURE_EN) { + u32 psv_index[2]; + + in-seg.flags_pd = cpu_to_be32(be32_to_cpu(in-seg.flags_pd) | + MLX5_MKEY_BSF_EN); + in-seg.bsfs_octo_size = cpu_to_be32(MLX5_MKEY_BSF_OCTO_SIZE); + mr-sig = kzalloc(sizeof(*mr-sig), GFP_KERNEL); + if (!mr-sig) { + err = -ENOMEM; + goto err_free
[PATCH RFC 8/9] IB/mlx5: Collect signature error completion
This commit takes care of the generated signature error cqe generated by the HW (if happened) and stores it on the QP signature error list. Once the user will get the completion for the transaction he must check for signature errors on signature memory region using a new lightweight verb ib_check_sig_status and if such exsists, get the signature error information. In case the user will not check for signature error, i.e. call ib_check_sig_status, it will not be allowed to use the memory region for another signature operation (REG_SIG_MR work request will fail). Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/hw/mlx5/cq.c | 49 ++ drivers/infiniband/hw/mlx5/main.c|1 + drivers/infiniband/hw/mlx5/mlx5_ib.h |2 + drivers/infiniband/hw/mlx5/mr.c | 34 +++ drivers/infiniband/hw/mlx5/qp.c | 14 +- include/linux/mlx5/cq.h |1 + include/linux/mlx5/device.h | 17 include/linux/mlx5/driver.h |2 + 8 files changed, 119 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c index 344ab03..c1d4029 100644 --- a/drivers/infiniband/hw/mlx5/cq.c +++ b/drivers/infiniband/hw/mlx5/cq.c @@ -351,6 +351,34 @@ static void handle_atomics(struct mlx5_ib_qp *qp, struct mlx5_cqe64 *cqe64, qp-sq.last_poll = tail; } +static void get_sig_err_item(struct mlx5_sig_err_cqe *cqe, +struct ib_sig_err *item) +{ + u16 syndrome = be16_to_cpu(cqe-syndrome); + + switch (syndrome) { + case 13: + item-err_type = IB_SIG_BAD_CRC; + break; + case 12: + item-err_type = IB_SIG_BAD_APPTAG; + break; + case 11: + item-err_type = IB_SIG_BAD_REFTAG; + break; + default: + break; + } + + item-expected_guard = be32_to_cpu(cqe-expected_trans_sig) 16; + item-actual_guard = be32_to_cpu(cqe-actual_trans_sig) 16; + item-expected_logical_block = be32_to_cpu(cqe-expected_reftag); + item-actual_logical_block = be32_to_cpu(cqe-actual_reftag); + item-sig_err_offset = be64_to_cpu(cqe-err_offset); + item-qpn = be32_to_cpu(cqe-qpn); + item-key = be32_to_cpu(cqe-mkey); +} + static int mlx5_poll_one(struct mlx5_ib_cq *cq, struct mlx5_ib_qp **cur_qp, struct ib_wc *wc) @@ -360,12 +388,15 @@ static int mlx5_poll_one(struct mlx5_ib_cq *cq, struct mlx5_cqe64 *cqe64; struct mlx5_core_qp *mqp; struct mlx5_ib_wq *wq; + struct mlx5_sig_err_cqe *sig_err_cqe; + struct ib_sig_err *err_item; uint8_t opcode; uint32_t qpn; u16 wqe_ctr; void *cqe; int idx; +repoll: cqe = next_cqe_sw(cq); if (!cqe) return -EAGAIN; @@ -449,6 +480,24 @@ static int mlx5_poll_one(struct mlx5_ib_cq *cq, } } break; + case MLX5_CQE_SIG_ERR: + sig_err_cqe = (struct mlx5_sig_err_cqe *)cqe64; + err_item = kzalloc(sizeof(*err_item), GFP_ATOMIC); + if (!err_item) { + mlx5_ib_err(dev, Failed to allocate sig_err item\n); + return -ENOMEM; + } + + get_sig_err_item(sig_err_cqe, err_item); + + mlx5_ib_dbg(dev, Got SIGERR on key: 0x%x\n, + err_item-key); + + spin_lock((*cur_qp)-sig_err_lock); + list_add(err_item-list, (*cur_qp)-sig_err_list); + spin_unlock((*cur_qp)-sig_err_lock); + + goto repoll; } return 0; diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index 2e67a37..f3c7111 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -1409,6 +1409,7 @@ static int init_one(struct pci_dev *pdev, dev-ib_dev.alloc_fast_reg_mr = mlx5_ib_alloc_fast_reg_mr; dev-ib_dev.alloc_fast_reg_page_list = mlx5_ib_alloc_fast_reg_page_list; dev-ib_dev.free_fast_reg_page_list = mlx5_ib_free_fast_reg_page_list; + dev-ib_dev.check_sig_status= mlx5_ib_check_sig_status; if (mdev-caps.flags MLX5_DEV_CAP_FLAG_XRC) { dev-ib_dev.alloc_xrcd = mlx5_ib_alloc_xrcd; diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index 1d5793e..73b8cf0 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -533,6 +533,8 @@ int mlx5_mr_cache_init(struct mlx5_ib_dev *dev); int mlx5_mr_cache_cleanup(struct mlx5_ib_dev *dev); int mlx5_mr_ib_cont_pages(struct ib_umem *umem, u64 addr, int *count, int *shift); void mlx5_umr_cq_handler(struct ib_cq *cq, void *cq_context); +int
[PATCH RFC 7/9] IB/mlx5: Support IB_WR_REG_SIG_MR
This patch implements IB_WR_REG_SIG_MR posted by the user. Baisically this WR involvs 3 WQEs in order to prepare and properly register the signature layout: 1. post UMR WR to register the sig_mr in one of two possible ways: * In case the user registered a single MR for data so the UMR data segment consists of: - single klm (data MR) passed by the user - BSF with signature attributes requested by the user. * In case the user registered 2 MRs, one for data and one for protection, the UMR consists of: - strided block format which includes data and protection MRs and their repetitive block format. - BSF with signature attributes requested by the user. 2. post SET_PSV in order to set the for the memory domain initial signature parameters passed by the user. 3. post SET_PSV in order to set the for the wire domain initial signature parameters passed by the user. This patch also introduces some helper functions to set the BSF correctly and determining the signature format selectors. Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/hw/mlx5/qp.c | 413 +++ include/linux/mlx5/qp.h | 56 ++ 2 files changed, 469 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index 2517fb3..971d434 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -1721,6 +1721,26 @@ static __be64 frwr_mkey_mask(void) return cpu_to_be64(result); } +static __be64 sig_mkey_mask(void) +{ + u64 result; + + result = MLX5_MKEY_MASK_LEN | + MLX5_MKEY_MASK_PAGE_SIZE| + MLX5_MKEY_MASK_START_ADDR | + MLX5_MKEY_MASK_EN_RINVAL| + MLX5_MKEY_MASK_KEY | + MLX5_MKEY_MASK_LR | + MLX5_MKEY_MASK_LW | + MLX5_MKEY_MASK_RR | + MLX5_MKEY_MASK_RW | + MLX5_MKEY_MASK_SMALL_FENCE | + MLX5_MKEY_MASK_FREE | + MLX5_MKEY_MASK_BSF_EN; + + return cpu_to_be64(result); +} + static void set_frwr_umr_segment(struct mlx5_wqe_umr_ctrl_seg *umr, struct ib_send_wr *wr, int li) { @@ -1903,6 +1923,336 @@ static int set_data_inl_seg(struct mlx5_ib_qp *qp, struct ib_send_wr *wr, return 0; } +static u16 prot_field_size(enum ib_signature_type type, u16 block_size) +{ + switch (type) { + case IB_SIG_TYPE_T10_DIF: + return MLX5_DIF_SIZE; + default: + return 0; + } +} + +static u8 bs_selector(int block_size) +{ + switch (block_size) { + case 512: return 0x1; + case 520: return 0x2; + case 4096: return 0x3; + case 4160: return 0x4; + case 1073741824:return 0x5; + default:return 0; + } +} + +static int format_selector(struct ib_sig_attrs *attr, + struct ib_sig_domain *domain, + int *selector) +{ + +#define FORMAT_DIF_NONE0 +#define FORMAT_DIF_CRC_INC 4 +#define FORMAT_DIF_CSUM_INC12 +#define FORMAT_DIF_CRC_NO_INC 13 +#define FORMAT_DIF_CSUM_NO_INC 14 + + switch (domain-sig.dif.type) { + case IB_T10DIF_NONE: + /* No DIF */ + *selector = FORMAT_DIF_NONE; + break; + case IB_T10DIF_TYPE1: /* Fall through */ + case IB_T10DIF_TYPE2: + switch (domain-sig.dif.bg_type) { + case IB_T10DIF_CRC: + *selector = FORMAT_DIF_CRC_INC; + break; + case IB_T10DIF_CSUM: + *selector = FORMAT_DIF_CSUM_INC; + break; + default: + return 1; + } + break; + case IB_T10DIF_TYPE3: + switch (domain-sig.dif.bg_type) { + case IB_T10DIF_CRC: + *selector = domain-sig.dif.type3_inc_reftag ? + FORMAT_DIF_CRC_INC : + FORMAT_DIF_CRC_NO_INC; + break; + case IB_T10DIF_CSUM: + *selector = domain-sig.dif.type3_inc_reftag ? + FORMAT_DIF_CSUM_INC : + FORMAT_DIF_CSUM_NO_INC; + break; + default: + return 1; + } + break; + default: + return 1; + } + + return 0; +} + +static int mlx5_set_bsf(struct ib_mr *sig_mr, + struct ib_sig_attrs *sig_attrs, + struct mlx5_bsf
[PATCH RFC 1/9] IB/core: Introduce indirect and protected memory regions
This commit introduces verbs for creating memory regions which will allow new types of memory key operations such as indirect memory registration and protected memory registration. Indirect memory registration is registering several (one of more) pre-registered memory regions in a specific layout. The Indirect region may potentialy describe several regions and some repitition format between them. Protected Memory registration is registering a memory region with various data integrity attributes that will describe protection schemes that will be enforced by the HCA in an offloaded manner. In the future these routines may replace current memory regions creation routines existing today: - ib_reg_user_mr - ib_alloc_fast_reg_mr - ib_get_dma_mr - ib_dereg_mr Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/core/verbs.c | 39 + include/rdma/ib_verbs.h | 46 +++ 2 files changed, 85 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 22192de..1d94a5c 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -1052,6 +1052,45 @@ int ib_dereg_mr(struct ib_mr *mr) } EXPORT_SYMBOL(ib_dereg_mr); +struct ib_mr *ib_create_mr(struct ib_pd *pd, + struct ib_mr_init_attr *mr_init_attr) +{ + struct ib_mr *mr; + + if (!pd-device-create_mr) + return ERR_PTR(-ENOSYS); + + mr = pd-device-create_mr(pd, mr_init_attr); + + if (!IS_ERR(mr)) { + mr-device = pd-device; + mr-pd = pd; + mr-uobject = NULL; + atomic_inc(pd-usecnt); + atomic_set(mr-usecnt, 0); + } + + return mr; +} +EXPORT_SYMBOL(ib_create_mr); + +int ib_destroy_mr(struct ib_mr *mr) +{ + struct ib_pd *pd; + int ret; + + if (atomic_read(mr-usecnt)) + return -EBUSY; + + pd = mr-pd; + ret = mr-device-destroy_mr(mr); + if (!ret) + atomic_dec(pd-usecnt); + + return ret; +} +EXPORT_SYMBOL(ib_destroy_mr); + struct ib_mr *ib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len) { struct ib_mr *mr; diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 645c3ce..65b7e79 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -925,6 +925,30 @@ enum ib_mr_rereg_flags { IB_MR_REREG_ACCESS = (12) }; +enum ib_mr_create_flags { + IB_MR_SIGNATURE_EN = 1, +}; + +enum ib_mr_reg_type { + IB_MR_REG_DIRECT, + IB_MR_REG_INDIRECT, +}; + +/** + * ib_mr_init_attr - Memory region init attributes passed to routine + * ib_create_mr. + * @reg_type: requested mapping type, this can be direct/indirect + * registration or repetitive structure registration. + * @max_reg_descriptors: max number of registration units that + * may be used with UMR work requests. + * @flags: MR creation flags bit mask. + */ +struct ib_mr_init_attr { + enum ib_mr_reg_type reg_type; + int max_reg_descriptors; + enum ib_mr_create_flags flags; +}; + /** * struct ib_mw_bind - Parameters for a type 1 memory window bind operation. * @wr_id: Work request id. @@ -1257,6 +1281,9 @@ struct ib_device { int(*query_mr)(struct ib_mr *mr, struct ib_mr_attr *mr_attr); int(*dereg_mr)(struct ib_mr *mr); + int(*destroy_mr)(struct ib_mr *mr); + struct ib_mr * (*create_mr)(struct ib_pd *pd, + struct ib_mr_init_attr *mr_init_attr); struct ib_mr * (*alloc_fast_reg_mr)(struct ib_pd *pd, int max_page_list_len); struct ib_fast_reg_page_list * (*alloc_fast_reg_page_list)(struct ib_device *device, @@ -2092,6 +2119,25 @@ int ib_query_mr(struct ib_mr *mr, struct ib_mr_attr *mr_attr); */ int ib_dereg_mr(struct ib_mr *mr); + +/** + * ib_create_mr - creates memory region that may be used for + * direct or indirect registration models via UMR WR. + * @pd: The protection domain associated with the region. + * @mr_init_attr: memory region init attributes. + */ +struct ib_mr *ib_create_mr(struct ib_pd *pd, + struct ib_mr_init_attr *mr_init_attr); + +/** + * ib_destroy_mr - Destroys a memory region that was created using + * ib_create_mr and removes it from HW translation tables. + * @mr: The memory region to destroy. + * + * This function can fail, if the memory region has memory windows bound to it. + */ +int ib_destroy_mr(struct ib_mr *mr); + /** * ib_alloc_fast_reg_mr - Allocates memory region usable with the * IB_WR_FAST_REG_MR send work request. -- 1.7.1 -- To unsubscribe from
[PATCH RFC 9/9] IB/mlx5: Publish support in signature feature
Currently support only T10-DIF types of signature handover operations (typs 1|2|3). Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/hw/mlx5/main.c |9 + 1 files changed, 9 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index f3c7111..3dd8219 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -274,6 +274,15 @@ static int mlx5_ib_query_device(struct ib_device *ibdev, if (flags MLX5_DEV_CAP_FLAG_XRC) props-device_cap_flags |= IB_DEVICE_XRC; props-device_cap_flags |= IB_DEVICE_MEM_MGT_EXTENSIONS; + if (flags MLX5_DEV_CAP_FLAG_SIG_HAND_OVER) { + props-device_cap_flags |= IB_DEVICE_SIGNATURE_HANDOVER; + /* At this stage no support for signature handover */ + props-sig_prot_cap = IB_PROT_T10DIF_TYPE_1 | + IB_PROT_T10DIF_TYPE_2 | + IB_PROT_T10DIF_TYPE_3; + props-sig_guard_cap = IB_GUARD_T10DIF_CRC | + IB_GUARD_T10DIF_CSUM; + } props-vendor_id = be32_to_cpup((__be32 *)(out_mad-data + 36)) 0xff; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC 0/9] Introduce Signature feature
This patchset Introduces Verbs level support for signature handover feature. Siganture is intended to implement end-to-end data integrity on a transactional basis in a completely offloaded manner. A signature handover operation is basically a translation of the data layout between the so called memory domain and wire domain in the context of data integrity support. There are several end-to-end data integrity methods used today in various applications and/or upper layer protocols such as T10-DIF defined by SCSI specifications (SBC), CRC32, XOR8 and more. This patchset adds verbs support only for T10-DIF. The proposed framework allows adding more signature methods. The way that data integrity is performed is by registering a protected region with siganture handover attributes and memory domain layout and in addition define the wire domain layout. defining both domains is equivalent to determining the signature hanover operation which can be strip/add/pass and validate data integrity when performing data transfer from input space and output space. When the data transfer is completed, the user may check the signature status of the handover operation and in case some data integrity error has occured receive a signature error item providing the relevant info on the error. This feature shall be used in storage upper layer protocols iSER/SRP implementing end-to-end data integrity T10-DIF. Following this patchset, we will soon submit krping activation code which will demonstrate the usage and activation of protected RDMA transactions using signature verbs. Patchset summary: - Intoduce verbs for create/destroy memory regions supporting signature. - Introduce IB core signature verbs API. - Implement mr create/destroy verbs in mlx5 driver. - Preperation patches for signature support in mlx5 driver. - Implement signature handover work request in mlx5 driver. - Implement signature error collection and handling in mlx5 driver. Sagi Grimberg (9): IB/core: Introduce indirect and protected memory regions IB/core: Introduce Signature Verbs API IB/mlx5, mlx5_core: Support for create_mr and destroy_mr IB/mlx5: Initialize mlx5_ib_qp signature related IB/mlx5: Break wqe handling to begin finish routines IB/mlx5: remove MTT access mode from umr flags helper function IB/mlx5: Support IB_WR_REG_SIG_MR IB/mlx5: Collect signature error completion IB/mlx5: Publish support in signature feature drivers/infiniband/core/verbs.c | 47 +++ drivers/infiniband/hw/mlx5/cq.c | 49 +++ drivers/infiniband/hw/mlx5/main.c| 12 + drivers/infiniband/hw/mlx5/mlx5_ib.h | 11 + drivers/infiniband/hw/mlx5/mr.c | 154 drivers/infiniband/hw/mlx5/qp.c | 532 -- drivers/net/ethernet/mellanox/mlx5/core/mr.c | 64 +++ include/linux/mlx5/cq.h |1 + include/linux/mlx5/device.h | 42 ++ include/linux/mlx5/driver.h | 24 ++ include/linux/mlx5/qp.h | 57 +++ include/rdma/ib_verbs.h | 186 +- 12 files changed, 1140 insertions(+), 39 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC 5/9] IB/mlx5: Break wqe handling to begin finish routines
As a preliminary step for signature feature which will reuqire posting multiple (3) WQEs for a single WR, we break post_send routine WQE indexing into begin and finish routines. This patch does not change any functionality. Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/hw/mlx5/qp.c | 95 --- 1 files changed, 59 insertions(+), 36 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index 9a8c622..57733a5 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -1985,6 +1985,57 @@ static u8 get_fence(u8 fence, struct ib_send_wr *wr) } } +static int begin_wqe(struct mlx5_ib_qp *qp, void **seg, +struct mlx5_wqe_ctrl_seg **ctrl, +struct ib_send_wr *wr, int *idx, +int *size, int nreq) +{ + int err = 0; + if (unlikely(mlx5_wq_overflow(qp-sq, nreq, qp-ibqp.send_cq))) { + err = -ENOMEM; + return err; + } + + *idx = qp-sq.cur_post (qp-sq.wqe_cnt - 1); + *seg = mlx5_get_send_wqe(qp, *idx); + *ctrl = *seg; + *(uint32_t *)(*seg + 8) = 0; + (*ctrl)-imm = send_ieth(wr); + (*ctrl)-fm_ce_se = qp-sq_signal_bits | + (wr-send_flags IB_SEND_SIGNALED ? +MLX5_WQE_CTRL_CQ_UPDATE : 0) | + (wr-send_flags IB_SEND_SOLICITED ? +MLX5_WQE_CTRL_SOLICITED : 0); + + *seg += sizeof(**ctrl); + *size = sizeof(**ctrl) / 16; + + return err; +} + +static void finish_wqe(struct mlx5_ib_qp *qp, + struct mlx5_wqe_ctrl_seg *ctrl, + u8 size, unsigned idx, u64 wr_id, + int *nreq, u8 fence, u8 next_fence, + u32 mlx5_opcode) +{ + u8 opmod = 0; + ctrl-opmod_idx_opcode = cpu_to_be32(((u32)(qp-sq.cur_post) 8) | +mlx5_opcode | ((u32)opmod 24)); + ctrl-qpn_ds = cpu_to_be32(size | (qp-mqp.qpn 8)); + ctrl-fm_ce_se |= fence; + qp-fm_cache = next_fence; + if (unlikely(qp-wq_sig)) + ctrl-signature = wq_sig(ctrl); + + qp-sq.wrid[idx] = wr_id; + qp-sq.w_list[idx].opcode = mlx5_opcode; + qp-sq.wqe_head[idx] = qp-sq.head + (*nreq)++; + qp-sq.cur_post += DIV_ROUND_UP(size * 16, MLX5_SEND_WQE_BB); + qp-sq.w_list[idx].next = qp-sq.cur_post; +} + + int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, struct ib_send_wr **bad_wr) { @@ -1998,7 +2049,6 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, int uninitialized_var(size); void *qend = qp-sq.qend; unsigned long flags; - u32 mlx5_opcode; unsigned idx; int err = 0; int inl = 0; @@ -2007,7 +2057,6 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, int nreq; int i; u8 next_fence = 0; - u8 opmod = 0; u8 fence; spin_lock_irqsave(qp-sq.lock, flags); @@ -2020,36 +2069,23 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, goto out; } - if (unlikely(mlx5_wq_overflow(qp-sq, nreq, qp-ibqp.send_cq))) { + fence = qp-fm_cache; + num_sge = wr-num_sge; + if (unlikely(num_sge qp-sq.max_gs)) { mlx5_ib_warn(dev, \n); err = -ENOMEM; *bad_wr = wr; goto out; } - fence = qp-fm_cache; - num_sge = wr-num_sge; - if (unlikely(num_sge qp-sq.max_gs)) { + err = begin_wqe(qp, seg, ctrl, wr, idx, size, nreq); + if (err) { mlx5_ib_warn(dev, \n); err = -ENOMEM; *bad_wr = wr; goto out; } - idx = qp-sq.cur_post (qp-sq.wqe_cnt - 1); - seg = mlx5_get_send_wqe(qp, idx); - ctrl = seg; - *(uint32_t *)(seg + 8) = 0; - ctrl-imm = send_ieth(wr); - ctrl-fm_ce_se = qp-sq_signal_bits | - (wr-send_flags IB_SEND_SIGNALED ? -MLX5_WQE_CTRL_CQ_UPDATE : 0) | - (wr-send_flags IB_SEND_SOLICITED ? -MLX5_WQE_CTRL_SOLICITED : 0); - - seg += sizeof(*ctrl); - size = sizeof(*ctrl) / 16; - switch (ibqp-qp_type) { case IB_QPT_XRC_INI: xrc = seg; @@ -2199,22 +2235,9 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, } } - mlx5_opcode = mlx5_ib_opcode[wr-opcode]; - ctrl-opmod_idx_opcode
[PATCH RFC 6/9] IB/mlx5: remove MTT access mode from umr flags helper function
get_umr_flags helper function might be used for types of access modes other than ACCESS_MODE_MTT, such as ACCESS_MODE_KLM. so remove it from helper and caller will add it's own access mode flag. This commit does not add/change functionality. Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/hw/mlx5/qp.c |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index 57733a5..2517fb3 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -1775,7 +1775,7 @@ static u8 get_umr_flags(int acc) (acc IB_ACCESS_REMOTE_WRITE ? MLX5_PERM_REMOTE_WRITE : 0) | (acc IB_ACCESS_REMOTE_READ ? MLX5_PERM_REMOTE_READ : 0) | (acc IB_ACCESS_LOCAL_WRITE ? MLX5_PERM_LOCAL_WRITE : 0) | - MLX5_PERM_LOCAL_READ | MLX5_PERM_UMR_EN | MLX5_ACCESS_MODE_MTT; + MLX5_PERM_LOCAL_READ | MLX5_PERM_UMR_EN; } static void set_mkey_segment(struct mlx5_mkey_seg *seg, struct ib_send_wr *wr, @@ -1787,7 +1787,8 @@ static void set_mkey_segment(struct mlx5_mkey_seg *seg, struct ib_send_wr *wr, return; } - seg-flags = get_umr_flags(wr-wr.fast_reg.access_flags); + seg-flags = get_umr_flags(wr-wr.fast_reg.access_flags) | +MLX5_ACCESS_MODE_MTT; *writ = seg-flags (MLX5_PERM_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE); seg-qpn_mkey7_0 = cpu_to_be32((wr-wr.fast_reg.rkey 0xff) | 0xff00); seg-flags_pd = cpu_to_be32(MLX5_MKEY_REMOTE_INVAL); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 2/9] IB/core: Introduce Signature Verbs API
On 10/18/2013 1:51 AM, Hefty, Sean wrote: @@ -885,6 +901,19 @@ struct ib_send_wr { u32 rkey; struct ib_mw_bind_info bind_info; } bind_mw; + struct { + struct ib_sig_attrs*sig_attrs; + struct ib_mr *sig_mr; + int access_flags; + /* Registered data mr */ + struct ib_mr *data_mr; + u32 data_size; + u64 data_va; + /* Registered protection mr */ + struct ib_mr *prot_mr; + u32 prot_size; + u64 prot_va; + } sig_handover; At what point do we admit that this is a ridiculous structure? If you are referring to ib_send_wr, I agree, shall we modify it to a union of typedefs so it becomes more readable? Help me understand what this WR is doing. Is this telling the HCA to copy data between local MRs? What is a 'data MR' versus a 'protected MR'? (I'm not hip on T10-DIF.) No data copy, god forbids... :) Let me start by giving a short intro on signature (and T10-DIF). In signature world, data may exist with protection information which is the guarding the data. In T10-DIF (Data Integrity Fields) for example, these are 8-byte guards which includes CRC for each 512 bytes of data (block). An HCA which support signature offload, is expected to validate that the data is intact (each block matches its guard) and send it correctly over the wire (in T10-DIF case the data and protection should be interleaved i.e. 512B of data followed by 8B of protection guard) or alternatively, validate data (+ protection) coming from the wire and write it to the associated memory areas. In the general case, the data and the protection guards may lie in different memory areas. SCSI mid-layer for instance, passes the transport driver 2 buffers using 2 scatterlists. The transport driver (or application in the general case), is expected to register each buffer (as it normally would in order to use RDMA) using 2 MRs. The signature handover operation is binding all the necessary information for the HCA together: where is the data (data_mr), where is the protection information (prot_mr), what are the signature properties (sig_attrs). Once this step is taken (WR is posted), a single MR (sig_mr) describes the signature handover operation and can be used to perform RDMA under signature presence. Once the HCA will perform RDMA over this MR, it will take into account the signature context of the transaction and will follow the signature attributes configured. Hope this helps, Sagi. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 1/9] IB/core: Introduce indirect and protected memory regions
On 10/18/2013 1:43 AM, Hefty, Sean wrote: This commit introduces verbs for creating memory regions which will allow new types of memory key operations such as indirect memory registration and protected memory registration. Indirect memory registration is registering several (one of more) pre-registered memory regions in a specific layout. The Indirect region may potentialy describe several regions and some repitition format between them. I didn't follow this direct versus indirect difference. See below. Hey Sean, thanks for looking into this! Indirect memory registration feature will be submitted in the future. Signature feature is using it under the hood. I'll remove it from v2 as it creates a source of confusion and I want to concentrate on signature. Now since you opened this door, briefly, unlike direct (known) MRs which are associated with a page-list, indirect MRs can be associated with other MRs in the form of a list of tuples {lkey, addr, len} providing more flexible memory registrations. +struct ib_mr *ib_create_mr(struct ib_pd *pd, + struct ib_mr_init_attr *mr_init_attr) +{ + struct ib_mr *mr; + + if (!pd-device-create_mr) + return ERR_PTR(-ENOSYS); + + mr = pd-device-create_mr(pd, mr_init_attr); + + if (!IS_ERR(mr)) { + mr-device = pd-device; + mr-pd = pd; + mr-uobject = NULL; + atomic_inc(pd-usecnt); + atomic_set(mr-usecnt, 0); + } + + return mr; +} +EXPORT_SYMBOL(ib_create_mr); + +int ib_destroy_mr(struct ib_mr *mr) +{ + struct ib_pd *pd; + int ret; + + if (atomic_read(mr-usecnt)) + return -EBUSY; + + pd = mr-pd; + ret = mr-device-destroy_mr(mr); + if (!ret) + atomic_dec(pd-usecnt); + + return ret; +} +EXPORT_SYMBOL(ib_destroy_mr); + struct ib_mr *ib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len) { struct ib_mr *mr; diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 645c3ce..65b7e79 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -925,6 +925,30 @@ enum ib_mr_rereg_flags { IB_MR_REREG_ACCESS = (12) }; +enum ib_mr_create_flags { + IB_MR_SIGNATURE_EN = 1, +}; + +enum ib_mr_reg_type { + IB_MR_REG_DIRECT, + IB_MR_REG_INDIRECT, +}; + +/** + * ib_mr_init_attr - Memory region init attributes passed to routine + * ib_create_mr. + * @reg_type: requested mapping type, this can be direct/indirect + * registration or repetitive structure registration. + * @max_reg_descriptors: max number of registration units that + * may be used with UMR work requests. + * @flags: MR creation flags bit mask. + */ +struct ib_mr_init_attr { + enum ib_mr_reg_type reg_type; + int max_reg_descriptors; + enum ib_mr_create_flags flags; +}; + /** * struct ib_mw_bind - Parameters for a type 1 memory window bind operation. * @wr_id: Work request id. @@ -1257,6 +1281,9 @@ struct ib_device { int(*query_mr)(struct ib_mr *mr, struct ib_mr_attr *mr_attr); int(*dereg_mr)(struct ib_mr *mr); + int(*destroy_mr)(struct ib_mr *mr); + struct ib_mr * (*create_mr)(struct ib_pd *pd, + struct ib_mr_init_attr *mr_init_attr); These create and destroy something called an 'MR', but are not actually associated with any memory buffers. Is this some sort of conceptual sub-protection domain? Why is this needed, versus defining new ib_mr_attr fields? This MR can be perceived as a generalization of fast_reg MR. When using fast memory registration the verbs user will call ib_alloc_fast_reg_mr() in order to allocate an MR that may be used for fast registration method by posting a fast registration work-request on the send-queue (FRWR). The user does not pass any memory buffers to ib_alloc_fast_reg_mr() as the actual registration is done via posting WR. This follows the same notation, but allows new functionality (such as signature enable). As things are today, No MR creation method (fast_reg, dma, phys, user...) allows passing initialization parameters. signature feature requires some internal resources management, and we need some kind of indication that signature is requested for this MR. I'm suggesting that this verb will cover the general case and later on, it is possible to extend this method to cover all existing flavors of MR creation (implement the existing ones with it). Do you agree? or do you prefer to extend other MR allocation methods to receive initialization parameters? - Sean -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More
Re: [PATCH RFC 2/9] IB/core: Introduce Signature Verbs API
On 10/21/2013 5:34 PM, Hefty, Sean wrote: The signature handover operation is binding all the necessary information for the HCA together: where is the data (data_mr), where is the protection information (prot_mr), what are the signature properties (sig_attrs). Once this step is taken (WR is posted), a single MR (sig_mr) describes the signature handover operation and can be used to perform RDMA under signature presence. Once the HCA will perform RDMA over this MR, it will take into account the signature context of the transaction and will follow the signature attributes configured. It seems like this changes loses the ability to use an SGL. I don't think so, Signature MR simply describes a signature associated memory region i.e. it is a memory region that also defines some signature operation offload aside from normal RDMA (for example validate strip). SGL are used to publish several rkeys for the server/target/peer to perform RDMA on each. In this case the user previously registered each MR which he wishes it's peer to RDMA over. Same story here, if user has several signature associated MRs, where he wish his peer to RDMA over (in a protected manner), he can use these rkeys to construct SGL. Why are the signature properties separate from the protection information? Well, Protection information is the actual protection block guards of the data (i.e. CRCs, XORs, DIFs etc..), while the signature properties structure is the descriptor telling the HCA how to treat/validate/generate the protection information. Note that signature support requires the HCA to be able to support INSERT operations. This means that there is no protection information and the HCA is asked to generate it and add it to the data stream (which may be incoming or outgoing...), Hope this helps. Sagi. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 2/9] IB/core: Introduce Signature Verbs API
On 10/22/2013 9:20 PM, Hefty, Sean wrote: Would we lose anything making this a new operation for the QP, versus trying to hook it into the existing ib_post_send call? If I understand correctly you are suggesting making it a verb? Well this operation is a fast-path operation - so I guess we will loose it in this case. Take SCSI for example, for each IO operation submitted by SCSI mid-layer, transport layer should perform any protection policy that SCSI asked for. From this point of view, signature operation resembles fast registration (since the transport does not own the IOP data buffers, so it uses fast registration methods). That is why we are hooking into ib_post_send. I'm suggesting multiple calls that can post to the send queue, rather than one call that does a giant switch statement at the beginning based on the opcode. Although I understand where you are coming from, We also lose in this case. If we go down this rode, we block the user from saving a HW doorbell by concatenating signature and RDMA WRs to a post list. I assume this is why fast_reg is also an extension of ib_post_send. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC v1 00/10] Introduce Signature feature
This patchset Introduces Verbs level support for signature handover feature. Siganture is intended to implement end-to-end data integrity on a transactional basis in a completely offloaded manner. There are several end-to-end data integrity methods used today in various applications and/or upper layer protocols such as T10-DIF defined by SCSI specifications (SBC), CRC32, XOR8 and more. This patchset adds verbs support only for T10-DIF. The proposed framework allows adding more signature methods in the future. In T10-DIF, when a series of 512-byte data blocks are transferred, each block is followed by an 8-byte guard. The guard consists of CRC that protects the integrity of the data in the block, and some other tags that protects against mis-directed IOs. Data can be protected when transferred over the wire, but can also be protected in the memory of the sender/receiver. This allows true end- to-end protection against bits flipping either over the wire, through gateways, in memory, over PCI, etc. While T10-DIF clearly defines that over the wire protection guards are interleaved into the data stream (each 512-Byte block followed by 8-byte guard), when in memory, the protection guards may reside in a buffer separated from the data. Depending on the application, it is usually easier to handle the data when it is contiguous. In this case the data buffer will be of size 512xN and the protection buffer will be of size 8xN (where N is the number of blocks in the transaction). There are 3 kinds of signature handover operation: 1. Take unprotected data (from wire or memory) and ADD protection guards. 2. Take protetected data (from wire or memory), validate the data integrity against the protection guards and STRIP the protection guards. 3. Take protected data (from wire or memory), validate the data integrity against the protection guards and PASS the data with the guards as-is. This translates to defining to the HCA how/if data protection exists in memory domain, and how/if data protection exists is wire domain. The way that data integrity is performed is by using a new kind of memory region: signature-enabled MR, and a new kind of work request: REG_SIG_MR. The REG_SIG_MR WR operates on the signature-enabled MR, and defines all the needed information for the signature handover (data buffer, protection buffer if needed and signature attributes). The result is an MR that can be used for data transfer as usual, that will also add/validate/strip/pass protection guards. When the data transfer is successfully completed, it does not mean that there are no integrity errors. The user must afterwards check the signature status of the handover operation using a new light-weight verb. This feature shall be used in storage upper layer protocols iSER/SRP implementing end-to-end data integrity T10-DIF. Following this patchset, we will soon submit krping patches which will demonstrate the usage of these signature verbs. Patchset summary: - Intoduce verbs for create/destroy memory regions supporting signature. - Introduce IB core signature verbs API. - Implement mr create/destroy verbs in mlx5 driver. - Preperation patches for signature support in mlx5 driver. - Implement signature handover work request in mlx5 driver. - Implement signature error collection and handling in mlx5 driver. Changes from v0: - Commit messages: Added more detailed explanation for signature work request. - IB/core: Remove indirect memory registration enablement from create_mr. Keep only signature enablement. - IB/mlx5: Changed signature error processing via MR radix lookup. Sagi Grimberg (10): IB/core: Introduce protected memory regions IB/core: Introduce Signature Verbs API IB/mlx5, mlx5_core: Support for create_mr and destroy_mr IB/mlx5: Initialize mlx5_ib_qp signature related IB/mlx5: Break wqe handling to begin finish routines IB/mlx5: remove MTT access mode from umr flags helper function IB/mlx5: Keep mlx5 MRs in a radix tree under device IB/mlx5: Support IB_WR_REG_SIG_MR IB/mlx5: Collect signature error completion IB/mlx5: Publish support in signature feature drivers/infiniband/core/verbs.c| 47 +++ drivers/infiniband/hw/mlx5/cq.c| 53 +++ drivers/infiniband/hw/mlx5/main.c | 12 + drivers/infiniband/hw/mlx5/mlx5_ib.h | 14 + drivers/infiniband/hw/mlx5/mr.c| 138 +++ drivers/infiniband/hw/mlx5/qp.c| 525 ++-- drivers/net/ethernet/mellanox/mlx5/core/main.c |1 + drivers/net/ethernet/mellanox/mlx5/core/mr.c | 84 include/linux/mlx5/cq.h|1 + include/linux/mlx5/device.h| 43 ++ include/linux/mlx5/driver.h| 35 ++ include/linux/mlx5/qp.h| 62 +++ include/rdma/ib_verbs.h| 172 - 13 files changed, 1148 insertions(+), 39 deletions
[PATCH RFC v1 09/10] IB/mlx5: Collect signature error completion
This commit takes care of the generated signature error cqe generated by the HW (if happened) and stores it on the QP signature error list. Once the user will get the completion for the transaction he must check for signature errors on signature memory region using a new lightweight verb ib_check_sig_status and if such exsists, he will get the signature error information. In case the user will not check for signature error, i.e. won't call ib_check_sig_status, it will not be allowed to use the memory region for another signature operation (REG_SIG_MR work request will fail). The underlying mlx5 will handle signature error completions and will mark the relevant memory region as dirty. Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/hw/mlx5/cq.c | 53 ++ drivers/infiniband/hw/mlx5/main.c|1 + drivers/infiniband/hw/mlx5/mlx5_ib.h |7 drivers/infiniband/hw/mlx5/mr.c | 29 ++ drivers/infiniband/hw/mlx5/qp.c |8 - include/linux/mlx5/cq.h |1 + include/linux/mlx5/device.h | 18 +++ include/linux/mlx5/driver.h |4 ++ include/linux/mlx5/qp.h |5 +++ 9 files changed, 124 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c index 344ab03..da7605b 100644 --- a/drivers/infiniband/hw/mlx5/cq.c +++ b/drivers/infiniband/hw/mlx5/cq.c @@ -351,6 +351,33 @@ static void handle_atomics(struct mlx5_ib_qp *qp, struct mlx5_cqe64 *cqe64, qp-sq.last_poll = tail; } +static void get_sig_err_item(struct mlx5_sig_err_cqe *cqe, +struct ib_sig_err *item) +{ + u16 syndrome = be16_to_cpu(cqe-syndrome); + + switch (syndrome) { + case 13: + item-err_type = IB_SIG_BAD_CRC; + break; + case 12: + item-err_type = IB_SIG_BAD_APPTAG; + break; + case 11: + item-err_type = IB_SIG_BAD_REFTAG; + break; + default: + break; + } + + item-expected_guard = be32_to_cpu(cqe-expected_trans_sig) 16; + item-actual_guard = be32_to_cpu(cqe-actual_trans_sig) 16; + item-expected_logical_block = be32_to_cpu(cqe-expected_reftag); + item-actual_logical_block = be32_to_cpu(cqe-actual_reftag); + item-sig_err_offset = be64_to_cpu(cqe-err_offset); + item-key = be32_to_cpu(cqe-mkey); +} + static int mlx5_poll_one(struct mlx5_ib_cq *cq, struct mlx5_ib_qp **cur_qp, struct ib_wc *wc) @@ -360,12 +387,16 @@ static int mlx5_poll_one(struct mlx5_ib_cq *cq, struct mlx5_cqe64 *cqe64; struct mlx5_core_qp *mqp; struct mlx5_ib_wq *wq; + struct mlx5_sig_err_cqe *sig_err_cqe; + struct mlx5_core_mr *mmr; + struct mlx5_ib_mr *mr; uint8_t opcode; uint32_t qpn; u16 wqe_ctr; void *cqe; int idx; +repoll: cqe = next_cqe_sw(cq); if (!cqe) return -EAGAIN; @@ -449,6 +480,28 @@ static int mlx5_poll_one(struct mlx5_ib_cq *cq, } } break; + case MLX5_CQE_SIG_ERR: + sig_err_cqe = (struct mlx5_sig_err_cqe *)cqe64; + + read_lock(dev-mdev.priv.mr_table.lock); + mmr = __mlx5_mr_lookup(dev-mdev, + be32_to_cpu(sig_err_cqe-mkey) 0xff00); + if (unlikely(!mmr)) { + mlx5_ib_warn(dev, CQE@CQ %06x for unknown MR %6x\n, +cq-mcq.cqn, be32_to_cpu(sig_err_cqe-mkey)); + return -EINVAL; + } + read_unlock(dev-mdev.priv.mr_table.lock); + + mr = to_mibmr(mmr); + + get_sig_err_item(sig_err_cqe, mr-sig-err_item); + mr-sig-sig_err_exists = true; + + mlx5_ib_dbg(dev, Got SIGERR on key: 0x%x\n, + mr-sig-err_item.key); + + goto repoll; } return 0; diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index 2e67a37..f3c7111 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -1409,6 +1409,7 @@ static int init_one(struct pci_dev *pdev, dev-ib_dev.alloc_fast_reg_mr = mlx5_ib_alloc_fast_reg_mr; dev-ib_dev.alloc_fast_reg_page_list = mlx5_ib_alloc_fast_reg_page_list; dev-ib_dev.free_fast_reg_page_list = mlx5_ib_free_fast_reg_page_list; + dev-ib_dev.check_sig_status= mlx5_ib_check_sig_status; if (mdev-caps.flags MLX5_DEV_CAP_FLAG_XRC) { dev-ib_dev.alloc_xrcd = mlx5_ib_alloc_xrcd; diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index 758f0e1..f175fa4 100644 --- a/drivers
[PATCH RFC v1 06/10] IB/mlx5: remove MTT access mode from umr flags helper function
get_umr_flags helper function might be used for types of access modes other than ACCESS_MODE_MTT, such as ACCESS_MODE_KLM. so remove it from helper and caller will add it's own access mode flag. This patch does not add/change functionality. Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/hw/mlx5/qp.c |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index dc8d9fc..ca78078 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -1773,7 +1773,7 @@ static u8 get_umr_flags(int acc) (acc IB_ACCESS_REMOTE_WRITE ? MLX5_PERM_REMOTE_WRITE : 0) | (acc IB_ACCESS_REMOTE_READ ? MLX5_PERM_REMOTE_READ : 0) | (acc IB_ACCESS_LOCAL_WRITE ? MLX5_PERM_LOCAL_WRITE : 0) | - MLX5_PERM_LOCAL_READ | MLX5_PERM_UMR_EN | MLX5_ACCESS_MODE_MTT; + MLX5_PERM_LOCAL_READ | MLX5_PERM_UMR_EN; } static void set_mkey_segment(struct mlx5_mkey_seg *seg, struct ib_send_wr *wr, @@ -1785,7 +1785,8 @@ static void set_mkey_segment(struct mlx5_mkey_seg *seg, struct ib_send_wr *wr, return; } - seg-flags = get_umr_flags(wr-wr.fast_reg.access_flags); + seg-flags = get_umr_flags(wr-wr.fast_reg.access_flags) | +MLX5_ACCESS_MODE_MTT; *writ = seg-flags (MLX5_PERM_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE); seg-qpn_mkey7_0 = cpu_to_be32((wr-wr.fast_reg.rkey 0xff) | 0xff00); seg-flags_pd = cpu_to_be32(MLX5_MKEY_REMOTE_INVAL); -- 1.7.8.2 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC v1 05/10] IB/mlx5: Break wqe handling to begin finish routines
As a preliminary step for signature feature which will reuqire posting multiple (3) WQEs for a single WR, we break post_send routine WQE indexing into begin and finish routines. This patch does not change any functionality. Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/hw/mlx5/qp.c | 95 --- 1 files changed, 59 insertions(+), 36 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index c80122e..dc8d9fc 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -1983,6 +1983,57 @@ static u8 get_fence(u8 fence, struct ib_send_wr *wr) } } +static int begin_wqe(struct mlx5_ib_qp *qp, void **seg, +struct mlx5_wqe_ctrl_seg **ctrl, +struct ib_send_wr *wr, int *idx, +int *size, int nreq) +{ + int err = 0; + if (unlikely(mlx5_wq_overflow(qp-sq, nreq, qp-ibqp.send_cq))) { + err = -ENOMEM; + return err; + } + + *idx = qp-sq.cur_post (qp-sq.wqe_cnt - 1); + *seg = mlx5_get_send_wqe(qp, *idx); + *ctrl = *seg; + *(uint32_t *)(*seg + 8) = 0; + (*ctrl)-imm = send_ieth(wr); + (*ctrl)-fm_ce_se = qp-sq_signal_bits | + (wr-send_flags IB_SEND_SIGNALED ? +MLX5_WQE_CTRL_CQ_UPDATE : 0) | + (wr-send_flags IB_SEND_SOLICITED ? +MLX5_WQE_CTRL_SOLICITED : 0); + + *seg += sizeof(**ctrl); + *size = sizeof(**ctrl) / 16; + + return err; +} + +static void finish_wqe(struct mlx5_ib_qp *qp, + struct mlx5_wqe_ctrl_seg *ctrl, + u8 size, unsigned idx, u64 wr_id, + int *nreq, u8 fence, u8 next_fence, + u32 mlx5_opcode) +{ + u8 opmod = 0; + ctrl-opmod_idx_opcode = cpu_to_be32(((u32)(qp-sq.cur_post) 8) | +mlx5_opcode | ((u32)opmod 24)); + ctrl-qpn_ds = cpu_to_be32(size | (qp-mqp.qpn 8)); + ctrl-fm_ce_se |= fence; + qp-fm_cache = next_fence; + if (unlikely(qp-wq_sig)) + ctrl-signature = wq_sig(ctrl); + + qp-sq.wrid[idx] = wr_id; + qp-sq.w_list[idx].opcode = mlx5_opcode; + qp-sq.wqe_head[idx] = qp-sq.head + (*nreq)++; + qp-sq.cur_post += DIV_ROUND_UP(size * 16, MLX5_SEND_WQE_BB); + qp-sq.w_list[idx].next = qp-sq.cur_post; +} + + int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, struct ib_send_wr **bad_wr) { @@ -1996,7 +2047,6 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, int uninitialized_var(size); void *qend = qp-sq.qend; unsigned long flags; - u32 mlx5_opcode; unsigned idx; int err = 0; int inl = 0; @@ -2005,7 +2055,6 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, int nreq; int i; u8 next_fence = 0; - u8 opmod = 0; u8 fence; spin_lock_irqsave(qp-sq.lock, flags); @@ -2018,36 +2067,23 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, goto out; } - if (unlikely(mlx5_wq_overflow(qp-sq, nreq, qp-ibqp.send_cq))) { + fence = qp-fm_cache; + num_sge = wr-num_sge; + if (unlikely(num_sge qp-sq.max_gs)) { mlx5_ib_warn(dev, \n); err = -ENOMEM; *bad_wr = wr; goto out; } - fence = qp-fm_cache; - num_sge = wr-num_sge; - if (unlikely(num_sge qp-sq.max_gs)) { + err = begin_wqe(qp, seg, ctrl, wr, idx, size, nreq); + if (err) { mlx5_ib_warn(dev, \n); err = -ENOMEM; *bad_wr = wr; goto out; } - idx = qp-sq.cur_post (qp-sq.wqe_cnt - 1); - seg = mlx5_get_send_wqe(qp, idx); - ctrl = seg; - *(uint32_t *)(seg + 8) = 0; - ctrl-imm = send_ieth(wr); - ctrl-fm_ce_se = qp-sq_signal_bits | - (wr-send_flags IB_SEND_SIGNALED ? -MLX5_WQE_CTRL_CQ_UPDATE : 0) | - (wr-send_flags IB_SEND_SOLICITED ? -MLX5_WQE_CTRL_SOLICITED : 0); - - seg += sizeof(*ctrl); - size = sizeof(*ctrl) / 16; - switch (ibqp-qp_type) { case IB_QPT_XRC_INI: xrc = seg; @@ -2197,22 +2233,9 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, } } - mlx5_opcode = mlx5_ib_opcode[wr-opcode]; - ctrl-opmod_idx_opcode
[PATCH RFC v1 04/10] IB/mlx5: Initialize mlx5_ib_qp signature related
If user requested signature enable we Initialize relevant mlx5_ib_qp members. we mark the qp as sig_enable we initiatlize empty sig_err_list, and we increase qp size. Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/hw/mlx5/mlx5_ib.h |3 +++ drivers/infiniband/hw/mlx5/qp.c |5 + include/linux/mlx5/qp.h |1 + 3 files changed, 9 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index 45d7424..758f0e1 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -189,6 +189,9 @@ struct mlx5_ib_qp { int create_type; u32 pa_lkey; + + /* Store signature errors */ + boolsignature_en; }; struct mlx5_ib_cq_buf { diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index 045f8cdb..c80122e 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -734,6 +734,11 @@ static int create_qp_common(struct mlx5_ib_dev *dev, struct ib_pd *pd, spin_lock_init(qp-sq.lock); spin_lock_init(qp-rq.lock); + if (init_attr-create_flags == IB_QP_CREATE_SIGNATURE_EN) { + init_attr-cap.max_send_wr *= MLX5_SIGNATURE_SQ_MULT; + qp-signature_en = true; + } + if (init_attr-sq_sig_type == IB_SIGNAL_ALL_WR) qp-sq_signal_bits = MLX5_WQE_CTRL_CQ_UPDATE; diff --git a/include/linux/mlx5/qp.h b/include/linux/mlx5/qp.h index d9e3eac..174805c 100644 --- a/include/linux/mlx5/qp.h +++ b/include/linux/mlx5/qp.h @@ -37,6 +37,7 @@ #include linux/mlx5/driver.h #define MLX5_INVALID_LKEY 0x100 +#define MLX5_SIGNATURE_SQ_MULT 3 enum mlx5_qp_optpar { MLX5_QP_OPTPAR_ALT_ADDR_PATH= 1 0, -- 1.7.8.2 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC v1 02/10] IB/core: Introduce Signature Verbs API
This commit Introduces the Verbs Interface for signature related operations. A signature handover operation shall configure the layouts of data and protection attributes both in memory and wire domains. Signature operations are: - INSERT Generate and insert protection information when handing over data from input space to output space. - vaildate and STRIP: Validate protection information and remove it when handing over data from input space to output space. - validate and PASS: Validate protection information and pass it when handing over data from input space to output space. Once the signature handover opration is done, the HCA will offload data integrity generation/validation while performing the actual data transfer. Additions: 1. HCA signature capabilities in device attributes Verbs provider supporting Signature handover operations shall fill relevant fields in device attributes structure returned by ib_query_device. 2. QP creation flag IB_QP_CREATE_SIGNATURE_EN Creating QP that will carry signature handover operations may require some special preperations from the verbs provider. So we add QP creation flag IB_QP_CREATE_SIGNATURE_EN to declare that the created QP may carry out signature handover operations. Expose signature support to verbs layer (no support for now) 3. New send work request IB_WR_REG_SIG_MR Signature handover work request. This WR will define the signature handover properties of the memory/wire domains as well as the domains layout. The purpose of this work request is to bind all the needed information for the signature operation: - data to be transferred: data_mr, data_va, data_size. * The raw data, pre-registered to a single MR (normally, before signature, this MR would have been used directly for the data transfer) - data protection guards: prot_mr, prot_va, prot_size. * The data protection buffer, pre-registered to a single MR, which contains the data integrity guards of the raw data blocks. Note that it may not always exist, only in cases where the user is interested in storing protection guards in memory. - signature operation attributes: sig_attrs. * Tells the HCA how to validate/generate the protection information. Once the work request is executed, the memory region which will describe the signature transaction will be the sig_mr. The application can now go ahead and send the sig_mr.rkey or use the sig_mr.lkey for data transfer. 4. New Verb ib_check_sig_status check_sig_status Verb shall check if any signature errors are pending for a specific signature-enabled ib_mr. This Verb is a lightwight check and is allowed to be taken from interrupt context. Application must call this verb after it is known that the actual data transfer has finished. Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/core/verbs.c |8 +++ include/rdma/ib_verbs.h | 134 ++- 2 files changed, 141 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 1d94a5c..5636d65 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -1293,3 +1293,11 @@ int ib_dealloc_xrcd(struct ib_xrcd *xrcd) return xrcd-device-dealloc_xrcd(xrcd); } EXPORT_SYMBOL(ib_dealloc_xrcd); + +int ib_check_sig_status(struct ib_mr *sig_mr, + struct ib_sig_err *sig_err) +{ + return sig_mr-device-check_sig_status ? + sig_mr-device-check_sig_status(sig_mr, sig_err) : -ENOSYS; +} +EXPORT_SYMBOL(ib_check_sig_status); diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 56f7e88..233f66d 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -116,7 +116,19 @@ enum ib_device_cap_flags { IB_DEVICE_MEM_MGT_EXTENSIONS= (121), IB_DEVICE_BLOCK_MULTICAST_LOOPBACK = (122), IB_DEVICE_MEM_WINDOW_TYPE_2A= (123), - IB_DEVICE_MEM_WINDOW_TYPE_2B= (124) + IB_DEVICE_MEM_WINDOW_TYPE_2B= (124), + IB_DEVICE_SIGNATURE_HANDOVER= (125), +}; + +enum ib_signature_prot_cap { + IB_PROT_T10DIF_TYPE_1 = 1, + IB_PROT_T10DIF_TYPE_2 = 1 1, + IB_PROT_T10DIF_TYPE_3 = 1 2, +}; + +enum ib_signature_guard_cap { + IB_GUARD_T10DIF_CRC = 1, + IB_GUARD_T10DIF_CSUM= 1 1, }; enum ib_atomic_cap { @@ -166,6 +178,8 @@ struct ib_device_attr { unsigned intmax_fast_reg_page_list_len; u16 max_pkeys; u8 local_ca_ack_delay; + enum ib_signature_prot_cap sig_prot_cap; + enum ib_signature_guard_cap sig_guard_cap; }; enum ib_mtu { @@ -630,6 +644,7 @@ enum ib_qp_type { enum ib_qp_create_flags { IB_QP_CREATE_IPOIB_UD_LSO = 1 0
[PATCH RFC v1 10/10] IB/mlx5: Publish support in signature feature
Currently support only T10-DIF types of signature handover operations (typs 1|2|3). Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/hw/mlx5/main.c |9 + 1 files changed, 9 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index f3c7111..3dd8219 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -274,6 +274,15 @@ static int mlx5_ib_query_device(struct ib_device *ibdev, if (flags MLX5_DEV_CAP_FLAG_XRC) props-device_cap_flags |= IB_DEVICE_XRC; props-device_cap_flags |= IB_DEVICE_MEM_MGT_EXTENSIONS; + if (flags MLX5_DEV_CAP_FLAG_SIG_HAND_OVER) { + props-device_cap_flags |= IB_DEVICE_SIGNATURE_HANDOVER; + /* At this stage no support for signature handover */ + props-sig_prot_cap = IB_PROT_T10DIF_TYPE_1 | + IB_PROT_T10DIF_TYPE_2 | + IB_PROT_T10DIF_TYPE_3; + props-sig_guard_cap = IB_GUARD_T10DIF_CRC | + IB_GUARD_T10DIF_CSUM; + } props-vendor_id = be32_to_cpup((__be32 *)(out_mad-data + 36)) 0xff; -- 1.7.8.2 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC v1 08/10] IB/mlx5: Support IB_WR_REG_SIG_MR
This patch implements IB_WR_REG_SIG_MR posted by the user. Baisically this WR involvs 3 WQEs in order to prepare and properly register the signature layout: 1. post UMR WR to register the sig_mr in one of two possible ways: * In case the user registered a single MR for data so the UMR data segment consists of: - single klm (data MR) passed by the user - BSF with signature attributes requested by the user. * In case the user registered 2 MRs, one for data and one for protection, the UMR consists of: - strided block format which includes data and protection MRs and their repetitive block format. - BSF with signature attributes requested by the user. 2. post SET_PSV in order to set the for the memory domain initial signature parameters passed by the user. 3. post SET_PSV in order to set the for the wire domain initial signature parameters passed by the user. This patch also introduces some helper functions to set the BSF correctly and determining the signature format selectors. Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/hw/mlx5/qp.c | 416 +++ include/linux/mlx5/qp.h | 56 ++ 2 files changed, 472 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index ca78078..d791e41 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -1719,6 +1719,26 @@ static __be64 frwr_mkey_mask(void) return cpu_to_be64(result); } +static __be64 sig_mkey_mask(void) +{ + u64 result; + + result = MLX5_MKEY_MASK_LEN | + MLX5_MKEY_MASK_PAGE_SIZE| + MLX5_MKEY_MASK_START_ADDR | + MLX5_MKEY_MASK_EN_RINVAL| + MLX5_MKEY_MASK_KEY | + MLX5_MKEY_MASK_LR | + MLX5_MKEY_MASK_LW | + MLX5_MKEY_MASK_RR | + MLX5_MKEY_MASK_RW | + MLX5_MKEY_MASK_SMALL_FENCE | + MLX5_MKEY_MASK_FREE | + MLX5_MKEY_MASK_BSF_EN; + + return cpu_to_be64(result); +} + static void set_frwr_umr_segment(struct mlx5_wqe_umr_ctrl_seg *umr, struct ib_send_wr *wr, int li) { @@ -1901,6 +1921,339 @@ static int set_data_inl_seg(struct mlx5_ib_qp *qp, struct ib_send_wr *wr, return 0; } +static u16 prot_field_size(enum ib_signature_type type, u16 block_size) +{ + switch (type) { + case IB_SIG_TYPE_T10_DIF: + return MLX5_DIF_SIZE; + default: + return 0; + } +} + +static u8 bs_selector(int block_size) +{ + switch (block_size) { + case 512: return 0x1; + case 520: return 0x2; + case 4096: return 0x3; + case 4160: return 0x4; + case 1073741824:return 0x5; + default:return 0; + } +} + +static int format_selector(struct ib_sig_attrs *attr, + struct ib_sig_domain *domain, + int *selector) +{ + +#define FORMAT_DIF_NONE0 +#define FORMAT_DIF_CRC_INC 4 +#define FORMAT_DIF_CSUM_INC12 +#define FORMAT_DIF_CRC_NO_INC 13 +#define FORMAT_DIF_CSUM_NO_INC 14 + + switch (domain-sig.dif.type) { + case IB_T10DIF_NONE: + /* No DIF */ + *selector = FORMAT_DIF_NONE; + break; + case IB_T10DIF_TYPE1: /* Fall through */ + case IB_T10DIF_TYPE2: + switch (domain-sig.dif.bg_type) { + case IB_T10DIF_CRC: + *selector = FORMAT_DIF_CRC_INC; + break; + case IB_T10DIF_CSUM: + *selector = FORMAT_DIF_CSUM_INC; + break; + default: + return 1; + } + break; + case IB_T10DIF_TYPE3: + switch (domain-sig.dif.bg_type) { + case IB_T10DIF_CRC: + *selector = domain-sig.dif.type3_inc_reftag ? + FORMAT_DIF_CRC_INC : + FORMAT_DIF_CRC_NO_INC; + break; + case IB_T10DIF_CSUM: + *selector = domain-sig.dif.type3_inc_reftag ? + FORMAT_DIF_CSUM_INC : + FORMAT_DIF_CSUM_NO_INC; + break; + default: + return 1; + } + break; + default: + return 1; + } + + return 0; +} + +static int mlx5_set_bsf(struct ib_mr *sig_mr, + struct ib_sig_attrs *sig_attrs, + struct mlx5_bsf
[PATCH RFC v1 07/10] IB/mlx5: Keep mlx5 MRs in a radix tree under device
This will be useful when processing signature errors on a specific key. The mlx5 driver will lookup the matching mlx5 memory region structure and mark it as dirty (contains signature errors). Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/net/ethernet/mellanox/mlx5/core/main.c |1 + drivers/net/ethernet/mellanox/mlx5/core/mr.c | 20 include/linux/mlx5/driver.h| 12 3 files changed, 33 insertions(+), 0 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index b47739b..5b7b3c7 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -428,6 +428,7 @@ int mlx5_dev_init(struct mlx5_core_dev *dev, struct pci_dev *pdev) mlx5_init_cq_table(dev); mlx5_init_qp_table(dev); mlx5_init_srq_table(dev); + mlx5_init_mr_table(dev); return 0; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mr.c b/drivers/net/ethernet/mellanox/mlx5/core/mr.c index 2ade604..f72e0b6 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/mr.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/mr.c @@ -36,9 +36,18 @@ #include linux/mlx5/cmd.h #include mlx5_core.h +void mlx5_init_mr_table(struct mlx5_core_dev *dev) +{ + struct mlx5_mr_table *table = dev-priv.mr_table; + + rwlock_init(table-lock); + INIT_RADIX_TREE(table-tree, GFP_ATOMIC); +} + int mlx5_core_create_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr, struct mlx5_create_mkey_mbox_in *in, int inlen) { + struct mlx5_mr_table *table = dev-priv.mr_table; struct mlx5_create_mkey_mbox_out out; int err; u8 key; @@ -63,14 +72,21 @@ int mlx5_core_create_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr, mr-key = mlx5_idx_to_mkey(be32_to_cpu(out.mkey) 0xff) | key; mlx5_core_dbg(dev, out 0x%x, key 0x%x, mkey 0x%x\n, be32_to_cpu(out.mkey), key, mr-key); + /* connect to MR tree */ + write_lock_irq(table-lock); + err = radix_tree_insert(table-tree, mr-key 0xff00, mr); + write_unlock_irq(table-lock); + return err; } EXPORT_SYMBOL(mlx5_core_create_mkey); int mlx5_core_destroy_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr) { + struct mlx5_mr_table *table = dev-priv.mr_table; struct mlx5_destroy_mkey_mbox_in in; struct mlx5_destroy_mkey_mbox_out out; + unsigned long flags; int err; memset(in, 0, sizeof(in)); @@ -85,6 +101,10 @@ int mlx5_core_destroy_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr) if (out.hdr.status) return mlx5_cmd_status_to_err(out.hdr); + write_lock_irqsave(table-lock, flags); + radix_tree_delete(table-tree, mr-key 0xff00); + write_unlock_irqrestore(table-lock, flags); + return err; } EXPORT_SYMBOL(mlx5_core_destroy_mkey); diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 7c33487..5fe0690 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -488,6 +488,13 @@ struct mlx5_srq_table { struct radix_tree_root tree; }; +struct mlx5_mr_table { + /* protect radix tree +*/ + rwlock_tlock; + struct radix_tree_root tree; +}; + struct mlx5_priv { charname[MLX5_MAX_NAME_LEN]; struct mlx5_eq_tableeq_table; @@ -516,6 +523,10 @@ struct mlx5_priv { struct mlx5_cq_tablecq_table; /* end: cq staff */ + /* start: mr staff */ + struct mlx5_mr_tablemr_table; + /* end: mr staff */ + /* start: alloc staff */ struct mutexpgdir_mutex; struct list_headpgdir_list; @@ -691,6 +702,7 @@ int mlx5_core_query_srq(struct mlx5_core_dev *dev, struct mlx5_core_srq *srq, struct mlx5_query_srq_mbox_out *out); int mlx5_core_arm_srq(struct mlx5_core_dev *dev, struct mlx5_core_srq *srq, u16 lwm, int is_srq); +void mlx5_init_mr_table(struct mlx5_core_dev *dev); int mlx5_core_create_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr, struct mlx5_create_mkey_mbox_in *in, int inlen); int mlx5_core_destroy_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr); -- 1.7.8.2 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC v1 03/10] IB/mlx5, mlx5_core: Support for create_mr and destroy_mr
Support create_mr and destroy_mr verbs. The user may request signature enable memory region attribute where in this case the memory region shall be indirect MR and shall be attached to with signature attributes (BSF, PSVs). Otherwise, the create_mr routine is equivalent to alloc_fast_reg_mr. Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/hw/mlx5/main.c|2 + drivers/infiniband/hw/mlx5/mlx5_ib.h |4 + drivers/infiniband/hw/mlx5/mr.c | 109 ++ drivers/net/ethernet/mellanox/mlx5/core/mr.c | 64 +++ include/linux/mlx5/device.h | 25 ++ include/linux/mlx5/driver.h | 19 + 6 files changed, 223 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index 3f831de..2e67a37 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -1401,9 +1401,11 @@ static int init_one(struct pci_dev *pdev, dev-ib_dev.get_dma_mr = mlx5_ib_get_dma_mr; dev-ib_dev.reg_user_mr = mlx5_ib_reg_user_mr; dev-ib_dev.dereg_mr= mlx5_ib_dereg_mr; + dev-ib_dev.destroy_mr = mlx5_ib_destroy_mr; dev-ib_dev.attach_mcast= mlx5_ib_mcg_attach; dev-ib_dev.detach_mcast= mlx5_ib_mcg_detach; dev-ib_dev.process_mad = mlx5_ib_process_mad; + dev-ib_dev.create_mr = mlx5_ib_create_mr; dev-ib_dev.alloc_fast_reg_mr = mlx5_ib_alloc_fast_reg_mr; dev-ib_dev.alloc_fast_reg_page_list = mlx5_ib_alloc_fast_reg_page_list; dev-ib_dev.free_fast_reg_page_list = mlx5_ib_free_fast_reg_page_list; diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index 836be91..45d7424 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -262,6 +262,7 @@ struct mlx5_ib_mr { int npages; struct completion done; enum ib_wc_status status; + struct mlx5_core_sig_ctx*sig; }; struct mlx5_ib_fast_reg_page_list { @@ -489,6 +490,9 @@ struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, u64 virt_addr, int access_flags, struct ib_udata *udata); int mlx5_ib_dereg_mr(struct ib_mr *ibmr); +int mlx5_ib_destroy_mr(struct ib_mr *ibmr); +struct ib_mr *mlx5_ib_create_mr(struct ib_pd *pd, + struct ib_mr_init_attr *mr_init_attr); struct ib_mr *mlx5_ib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len); struct ib_fast_reg_page_list *mlx5_ib_alloc_fast_reg_page_list(struct ib_device *ibdev, diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c index bd41df9..44f7e46 100644 --- a/drivers/infiniband/hw/mlx5/mr.c +++ b/drivers/infiniband/hw/mlx5/mr.c @@ -921,6 +921,115 @@ int mlx5_ib_dereg_mr(struct ib_mr *ibmr) return 0; } +struct ib_mr *mlx5_ib_create_mr(struct ib_pd *pd, + struct ib_mr_init_attr *mr_init_attr) +{ + struct mlx5_ib_dev *dev = to_mdev(pd-device); + struct mlx5_create_mkey_mbox_in *in; + struct mlx5_ib_mr *mr; + int access_mode, err; + int ndescs = roundup(mr_init_attr-max_reg_descriptors, 4); + + mr = kzalloc(sizeof(*mr), GFP_KERNEL); + if (!mr) + return ERR_PTR(-ENOMEM); + + in = kzalloc(sizeof(*in), GFP_KERNEL); + if (!in) { + err = -ENOMEM; + goto err_free; + } + + in-seg.status = 1 6; /* free */ + in-seg.xlt_oct_size = cpu_to_be32(ndescs); + in-seg.qpn_mkey7_0 = cpu_to_be32(0xff 8); + in-seg.flags_pd = cpu_to_be32(to_mpd(pd)-pdn); + access_mode = MLX5_ACCESS_MODE_MTT; + + if (mr_init_attr-flags IB_MR_SIGNATURE_EN) { + u32 psv_index[2]; + + in-seg.flags_pd = cpu_to_be32(be32_to_cpu(in-seg.flags_pd) | + MLX5_MKEY_BSF_EN); + in-seg.bsfs_octo_size = cpu_to_be32(MLX5_MKEY_BSF_OCTO_SIZE); + mr-sig = kzalloc(sizeof(*mr-sig), GFP_KERNEL); + if (!mr-sig) { + err = -ENOMEM; + goto err_free; + } + + /* create mem wire PSVs */ + err = mlx5_core_create_psv(dev-mdev, to_mpd(pd)-pdn, + 2, psv_index); + if (err) + goto err_free_sig; + + access_mode = MLX5_ACCESS_MODE_KLM; + mr-sig-psv_memory.psv_idx = psv_index[0]; + mr-sig-psv_wire.psv_idx = psv_index[1]; + } + + in-seg.flags = MLX5_PERM_UMR_EN | access_mode; + err = mlx5_core_create_mkey(dev-mdev
[PATCH RFC v1 01/10] IB/core: Introduce protected memory regions
This commit introduces verbs for creating/destoying memory regions which will allow new types of memory key operations such as protected memory registration. Indirect memory registration is registering several (one of more) pre-registered memory regions in a specific layout. The Indirect region may potentialy describe several regions and some repitition format between them. Protected Memory registration is registering a memory region with various data integrity attributes which will describe protection schemes that will be handled by the HCA in an offloaded manner. These memory regions will be applicable for a new REG_SIG_MR work request introduced later in this patchset. In the future these routines may replace or implement current memory regions creation routines existing today: - ib_reg_user_mr - ib_alloc_fast_reg_mr - ib_get_dma_mr - ib_dereg_mr Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/core/verbs.c | 39 +++ include/rdma/ib_verbs.h | 38 ++ 2 files changed, 77 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 22192de..1d94a5c 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -1052,6 +1052,45 @@ int ib_dereg_mr(struct ib_mr *mr) } EXPORT_SYMBOL(ib_dereg_mr); +struct ib_mr *ib_create_mr(struct ib_pd *pd, + struct ib_mr_init_attr *mr_init_attr) +{ + struct ib_mr *mr; + + if (!pd-device-create_mr) + return ERR_PTR(-ENOSYS); + + mr = pd-device-create_mr(pd, mr_init_attr); + + if (!IS_ERR(mr)) { + mr-device = pd-device; + mr-pd = pd; + mr-uobject = NULL; + atomic_inc(pd-usecnt); + atomic_set(mr-usecnt, 0); + } + + return mr; +} +EXPORT_SYMBOL(ib_create_mr); + +int ib_destroy_mr(struct ib_mr *mr) +{ + struct ib_pd *pd; + int ret; + + if (atomic_read(mr-usecnt)) + return -EBUSY; + + pd = mr-pd; + ret = mr-device-destroy_mr(mr); + if (!ret) + atomic_dec(pd-usecnt); + + return ret; +} +EXPORT_SYMBOL(ib_destroy_mr); + struct ib_mr *ib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len) { struct ib_mr *mr; diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 645c3ce..56f7e88 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -925,6 +925,22 @@ enum ib_mr_rereg_flags { IB_MR_REREG_ACCESS = (12) }; +enum ib_mr_create_flags { + IB_MR_SIGNATURE_EN = 1, +}; + +/** + * ib_mr_init_attr - Memory region init attributes passed to routine + * ib_create_mr. + * @max_reg_descriptors: max number of registration units that + * may be used with UMR work requests. + * @flags: MR creation flags bit mask. + */ +struct ib_mr_init_attr { + int max_reg_descriptors; + enum ib_mr_create_flags flags; +}; + /** * struct ib_mw_bind - Parameters for a type 1 memory window bind operation. * @wr_id: Work request id. @@ -1257,6 +1273,9 @@ struct ib_device { int(*query_mr)(struct ib_mr *mr, struct ib_mr_attr *mr_attr); int(*dereg_mr)(struct ib_mr *mr); + int(*destroy_mr)(struct ib_mr *mr); + struct ib_mr * (*create_mr)(struct ib_pd *pd, + struct ib_mr_init_attr *mr_init_attr); struct ib_mr * (*alloc_fast_reg_mr)(struct ib_pd *pd, int max_page_list_len); struct ib_fast_reg_page_list * (*alloc_fast_reg_page_list)(struct ib_device *device, @@ -2092,6 +2111,25 @@ int ib_query_mr(struct ib_mr *mr, struct ib_mr_attr *mr_attr); */ int ib_dereg_mr(struct ib_mr *mr); + +/** + * ib_create_mr - creates memory region that may be used for + * direct or indirect registration models via UMR WR. + * @pd: The protection domain associated with the region. + * @mr_init_attr: memory region init attributes. + */ +struct ib_mr *ib_create_mr(struct ib_pd *pd, + struct ib_mr_init_attr *mr_init_attr); + +/** + * ib_destroy_mr - Destroys a memory region that was created using + * ib_create_mr and removes it from HW translation tables. + * @mr: The memory region to destroy. + * + * This function can fail, if the memory region has memory windows bound to it. + */ +int ib_destroy_mr(struct ib_mr *mr); + /** * ib_alloc_fast_reg_mr - Allocates memory region usable with the * IB_WR_FAST_REG_MR send work request. -- 1.7.8.2 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http
Re: [PATCH RFC v1 01/10] IB/core: Introduce protected memory regions
On 10/28/2013 11:22 PM, Hefty, Sean wrote: +enum ib_mr_create_flags { + IB_MR_SIGNATURE_EN = 1, +}; + +/** + * ib_mr_init_attr - Memory region init attributes passed to routine + * ib_create_mr. + * @max_reg_descriptors: max number of registration units that + * may be used with UMR work requests. + * @flags: MR creation flags bit mask. + */ +struct ib_mr_init_attr { + int max_reg_descriptors; + enum ib_mr_create_flags flags; Assuming that flags will be a bitwise OR of values, they should be an int, not an enum. Right, will fix. The same applies to signature caps in ib_device. Sagi. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC v2 08/10] IB/mlx5: Support IB_WR_REG_SIG_MR
This patch implements IB_WR_REG_SIG_MR posted by the user. Baisically this WR involvs 3 WQEs in order to prepare and properly register the signature layout: 1. post UMR WR to register the sig_mr in one of two possible ways: * In case the user registered a single MR for data so the UMR data segment consists of: - single klm (data MR) passed by the user - BSF with signature attributes requested by the user. * In case the user registered 2 MRs, one for data and one for protection, the UMR consists of: - strided block format which includes data and protection MRs and their repetitive block format. - BSF with signature attributes requested by the user. 2. post SET_PSV in order to set the for the memory domain initial signature parameters passed by the user. 3. post SET_PSV in order to set the for the wire domain initial signature parameters passed by the user. This patch also introduces some helper functions to set the BSF correctly and determining the signature format selectors. Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/hw/mlx5/qp.c | 416 +++ include/linux/mlx5/qp.h | 56 ++ 2 files changed, 472 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index ca78078..37e3715 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -1719,6 +1719,26 @@ static __be64 frwr_mkey_mask(void) return cpu_to_be64(result); } +static __be64 sig_mkey_mask(void) +{ + u64 result; + + result = MLX5_MKEY_MASK_LEN | + MLX5_MKEY_MASK_PAGE_SIZE| + MLX5_MKEY_MASK_START_ADDR | + MLX5_MKEY_MASK_EN_RINVAL| + MLX5_MKEY_MASK_KEY | + MLX5_MKEY_MASK_LR | + MLX5_MKEY_MASK_LW | + MLX5_MKEY_MASK_RR | + MLX5_MKEY_MASK_RW | + MLX5_MKEY_MASK_SMALL_FENCE | + MLX5_MKEY_MASK_FREE | + MLX5_MKEY_MASK_BSF_EN; + + return cpu_to_be64(result); +} + static void set_frwr_umr_segment(struct mlx5_wqe_umr_ctrl_seg *umr, struct ib_send_wr *wr, int li) { @@ -1901,6 +1921,339 @@ static int set_data_inl_seg(struct mlx5_ib_qp *qp, struct ib_send_wr *wr, return 0; } +static u16 prot_field_size(enum ib_signature_type type, u16 block_size) +{ + switch (type) { + case IB_SIG_TYPE_T10_DIF: + return MLX5_DIF_SIZE; + default: + return 0; + } +} + +static u8 bs_selector(int block_size) +{ + switch (block_size) { + case 512: return 0x1; + case 520: return 0x2; + case 4096: return 0x3; + case 4160: return 0x4; + case 1073741824:return 0x5; + default:return 0; + } +} + +static int format_selector(struct ib_sig_attrs *attr, + struct ib_sig_domain *domain, + int *selector) +{ + +#define FORMAT_DIF_NONE0 +#define FORMAT_DIF_CRC_INC 4 +#define FORMAT_DIF_CSUM_INC12 +#define FORMAT_DIF_CRC_NO_INC 13 +#define FORMAT_DIF_CSUM_NO_INC 14 + + switch (domain-sig.dif.type) { + case IB_T10DIF_NONE: + /* No DIF */ + *selector = FORMAT_DIF_NONE; + break; + case IB_T10DIF_TYPE1: /* Fall through */ + case IB_T10DIF_TYPE2: + switch (domain-sig.dif.bg_type) { + case IB_T10DIF_CRC: + *selector = FORMAT_DIF_CRC_INC; + break; + case IB_T10DIF_CSUM: + *selector = FORMAT_DIF_CSUM_INC; + break; + default: + return 1; + } + break; + case IB_T10DIF_TYPE3: + switch (domain-sig.dif.bg_type) { + case IB_T10DIF_CRC: + *selector = domain-sig.dif.type3_inc_reftag ? + FORMAT_DIF_CRC_INC : + FORMAT_DIF_CRC_NO_INC; + break; + case IB_T10DIF_CSUM: + *selector = domain-sig.dif.type3_inc_reftag ? + FORMAT_DIF_CSUM_INC : + FORMAT_DIF_CSUM_NO_INC; + break; + default: + return 1; + } + break; + default: + return 1; + } + + return 0; +} + +static int mlx5_set_bsf(struct ib_mr *sig_mr, + struct ib_sig_attrs *sig_attrs, + struct mlx5_bsf
[PATCH RFC v2 02/10] IB/core: Introduce Signature Verbs API
This commit Introduces the Verbs Interface for signature related operations. A signature handover operation shall configure the layouts of data and protection attributes both in memory and wire domains. Signature operations are: - INSERT Generate and insert protection information when handing over data from input space to output space. - vaildate and STRIP: Validate protection information and remove it when handing over data from input space to output space. - validate and PASS: Validate protection information and pass it when handing over data from input space to output space. Once the signature handover opration is done, the HCA will offload data integrity generation/validation while performing the actual data transfer. Additions: 1. HCA signature capabilities in device attributes Verbs provider supporting Signature handover operations shall fill relevant fields in device attributes structure returned by ib_query_device. 2. QP creation flag IB_QP_CREATE_SIGNATURE_EN Creating QP that will carry signature handover operations may require some special preperations from the verbs provider. So we add QP creation flag IB_QP_CREATE_SIGNATURE_EN to declare that the created QP may carry out signature handover operations. Expose signature support to verbs layer (no support for now) 3. New send work request IB_WR_REG_SIG_MR Signature handover work request. This WR will define the signature handover properties of the memory/wire domains as well as the domains layout. The purpose of this work request is to bind all the needed information for the signature operation: - data to be transferred: wr-sg_list. * The raw data, pre-registered to a single MR (normally, before signature, this MR would have been used directly for the data transfer). the user will pass the data sge via sg_list exsisting member. - data protection guards: sig_handover.prot. * The data protection buffer, pre-registered to a single MR, which contains the data integrity guards of the raw data blocks. Note that it may not always exist, only in cases where the user is interested in storing protection guards in memory. - signature operation attributes: sig_handover.sig_attrs. * Tells the HCA how to validate/generate the protection information. Once the work request is executed, the memory region which will describe the signature transaction will be the sig_mr. The application can now go ahead and send the sig_mr.rkey or use the sig_mr.lkey for data transfer. 4. New Verb ib_check_sig_status check_sig_status Verb shall check if any signature errors are pending for a specific signature-enabled ib_mr. This Verb is a lightwight check and is allowed to be taken from interrupt context. Application must call this verb after it is known that the actual data transfer has finished. Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/core/verbs.c |8 +++ include/rdma/ib_verbs.h | 127 ++- 2 files changed, 134 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 1d94a5c..5636d65 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -1293,3 +1293,11 @@ int ib_dealloc_xrcd(struct ib_xrcd *xrcd) return xrcd-device-dealloc_xrcd(xrcd); } EXPORT_SYMBOL(ib_dealloc_xrcd); + +int ib_check_sig_status(struct ib_mr *sig_mr, + struct ib_sig_err *sig_err) +{ + return sig_mr-device-check_sig_status ? + sig_mr-device-check_sig_status(sig_mr, sig_err) : -ENOSYS; +} +EXPORT_SYMBOL(ib_check_sig_status); diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 53f065d..19b37eb 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -116,7 +116,19 @@ enum ib_device_cap_flags { IB_DEVICE_MEM_MGT_EXTENSIONS= (121), IB_DEVICE_BLOCK_MULTICAST_LOOPBACK = (122), IB_DEVICE_MEM_WINDOW_TYPE_2A= (123), - IB_DEVICE_MEM_WINDOW_TYPE_2B= (124) + IB_DEVICE_MEM_WINDOW_TYPE_2B= (124), + IB_DEVICE_SIGNATURE_HANDOVER= (125), +}; + +enum ib_signature_prot_cap { + IB_PROT_T10DIF_TYPE_1 = 1, + IB_PROT_T10DIF_TYPE_2 = 1 1, + IB_PROT_T10DIF_TYPE_3 = 1 2, +}; + +enum ib_signature_guard_cap { + IB_GUARD_T10DIF_CRC = 1, + IB_GUARD_T10DIF_CSUM= 1 1, }; enum ib_atomic_cap { @@ -166,6 +178,8 @@ struct ib_device_attr { unsigned intmax_fast_reg_page_list_len; u16 max_pkeys; u8 local_ca_ack_delay; + int sig_prot_cap; + int sig_guard_cap; }; enum ib_mtu { @@ -630,6 +644,7 @@ enum ib_qp_type { enum ib_qp_create_flags { IB_QP_CREATE_IPOIB_UD_LSO
[PATCH RFC v2 06/10] IB/mlx5: remove MTT access mode from umr flags helper function
get_umr_flags helper function might be used for types of access modes other than ACCESS_MODE_MTT, such as ACCESS_MODE_KLM. so remove it from helper and caller will add it's own access mode flag. This commit does not add/change functionality. Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/hw/mlx5/qp.c |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index dc8d9fc..ca78078 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -1773,7 +1773,7 @@ static u8 get_umr_flags(int acc) (acc IB_ACCESS_REMOTE_WRITE ? MLX5_PERM_REMOTE_WRITE : 0) | (acc IB_ACCESS_REMOTE_READ ? MLX5_PERM_REMOTE_READ : 0) | (acc IB_ACCESS_LOCAL_WRITE ? MLX5_PERM_LOCAL_WRITE : 0) | - MLX5_PERM_LOCAL_READ | MLX5_PERM_UMR_EN | MLX5_ACCESS_MODE_MTT; + MLX5_PERM_LOCAL_READ | MLX5_PERM_UMR_EN; } static void set_mkey_segment(struct mlx5_mkey_seg *seg, struct ib_send_wr *wr, @@ -1785,7 +1785,8 @@ static void set_mkey_segment(struct mlx5_mkey_seg *seg, struct ib_send_wr *wr, return; } - seg-flags = get_umr_flags(wr-wr.fast_reg.access_flags); + seg-flags = get_umr_flags(wr-wr.fast_reg.access_flags) | +MLX5_ACCESS_MODE_MTT; *writ = seg-flags (MLX5_PERM_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE); seg-qpn_mkey7_0 = cpu_to_be32((wr-wr.fast_reg.rkey 0xff) | 0xff00); seg-flags_pd = cpu_to_be32(MLX5_MKEY_REMOTE_INVAL); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC v2 10/10] IB/mlx5: Publish support in signature feature
Currently support only T10-DIF types of signature handover operations (typs 1|2|3). Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/hw/mlx5/main.c |9 + 1 files changed, 9 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index f3c7111..3dd8219 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -274,6 +274,15 @@ static int mlx5_ib_query_device(struct ib_device *ibdev, if (flags MLX5_DEV_CAP_FLAG_XRC) props-device_cap_flags |= IB_DEVICE_XRC; props-device_cap_flags |= IB_DEVICE_MEM_MGT_EXTENSIONS; + if (flags MLX5_DEV_CAP_FLAG_SIG_HAND_OVER) { + props-device_cap_flags |= IB_DEVICE_SIGNATURE_HANDOVER; + /* At this stage no support for signature handover */ + props-sig_prot_cap = IB_PROT_T10DIF_TYPE_1 | + IB_PROT_T10DIF_TYPE_2 | + IB_PROT_T10DIF_TYPE_3; + props-sig_guard_cap = IB_GUARD_T10DIF_CRC | + IB_GUARD_T10DIF_CSUM; + } props-vendor_id = be32_to_cpup((__be32 *)(out_mad-data + 36)) 0xff; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC v2 04/10] IB/mlx5: Initialize mlx5_ib_qp signature related
If user requested signature enable we Initialize relevant mlx5_ib_qp members. we mark the qp as sig_enable we initiatlize empty sig_err_list, and we increase qp size. Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/hw/mlx5/mlx5_ib.h |3 +++ drivers/infiniband/hw/mlx5/qp.c |5 + include/linux/mlx5/qp.h |1 + 3 files changed, 9 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index 45d7424..758f0e1 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -189,6 +189,9 @@ struct mlx5_ib_qp { int create_type; u32 pa_lkey; + + /* Store signature errors */ + boolsignature_en; }; struct mlx5_ib_cq_buf { diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index 045f8cd..c80122e 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -734,6 +734,11 @@ static int create_qp_common(struct mlx5_ib_dev *dev, struct ib_pd *pd, spin_lock_init(qp-sq.lock); spin_lock_init(qp-rq.lock); + if (init_attr-create_flags == IB_QP_CREATE_SIGNATURE_EN) { + init_attr-cap.max_send_wr *= MLX5_SIGNATURE_SQ_MULT; + qp-signature_en = true; + } + if (init_attr-sq_sig_type == IB_SIGNAL_ALL_WR) qp-sq_signal_bits = MLX5_WQE_CTRL_CQ_UPDATE; diff --git a/include/linux/mlx5/qp.h b/include/linux/mlx5/qp.h index d9e3eac..174805c 100644 --- a/include/linux/mlx5/qp.h +++ b/include/linux/mlx5/qp.h @@ -37,6 +37,7 @@ #include linux/mlx5/driver.h #define MLX5_INVALID_LKEY 0x100 +#define MLX5_SIGNATURE_SQ_MULT 3 enum mlx5_qp_optpar { MLX5_QP_OPTPAR_ALT_ADDR_PATH= 1 0, -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC v2 05/10] IB/mlx5: Break wqe handling to begin finish routines
As a preliminary step for signature feature which will reuqire posting multiple (3) WQEs for a single WR, we break post_send routine WQE indexing into begin and finish routines. This patch does not change any functionality. Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/hw/mlx5/qp.c | 95 --- 1 files changed, 59 insertions(+), 36 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index c80122e..dc8d9fc 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -1983,6 +1983,57 @@ static u8 get_fence(u8 fence, struct ib_send_wr *wr) } } +static int begin_wqe(struct mlx5_ib_qp *qp, void **seg, +struct mlx5_wqe_ctrl_seg **ctrl, +struct ib_send_wr *wr, int *idx, +int *size, int nreq) +{ + int err = 0; + if (unlikely(mlx5_wq_overflow(qp-sq, nreq, qp-ibqp.send_cq))) { + err = -ENOMEM; + return err; + } + + *idx = qp-sq.cur_post (qp-sq.wqe_cnt - 1); + *seg = mlx5_get_send_wqe(qp, *idx); + *ctrl = *seg; + *(uint32_t *)(*seg + 8) = 0; + (*ctrl)-imm = send_ieth(wr); + (*ctrl)-fm_ce_se = qp-sq_signal_bits | + (wr-send_flags IB_SEND_SIGNALED ? +MLX5_WQE_CTRL_CQ_UPDATE : 0) | + (wr-send_flags IB_SEND_SOLICITED ? +MLX5_WQE_CTRL_SOLICITED : 0); + + *seg += sizeof(**ctrl); + *size = sizeof(**ctrl) / 16; + + return err; +} + +static void finish_wqe(struct mlx5_ib_qp *qp, + struct mlx5_wqe_ctrl_seg *ctrl, + u8 size, unsigned idx, u64 wr_id, + int *nreq, u8 fence, u8 next_fence, + u32 mlx5_opcode) +{ + u8 opmod = 0; + ctrl-opmod_idx_opcode = cpu_to_be32(((u32)(qp-sq.cur_post) 8) | +mlx5_opcode | ((u32)opmod 24)); + ctrl-qpn_ds = cpu_to_be32(size | (qp-mqp.qpn 8)); + ctrl-fm_ce_se |= fence; + qp-fm_cache = next_fence; + if (unlikely(qp-wq_sig)) + ctrl-signature = wq_sig(ctrl); + + qp-sq.wrid[idx] = wr_id; + qp-sq.w_list[idx].opcode = mlx5_opcode; + qp-sq.wqe_head[idx] = qp-sq.head + (*nreq)++; + qp-sq.cur_post += DIV_ROUND_UP(size * 16, MLX5_SEND_WQE_BB); + qp-sq.w_list[idx].next = qp-sq.cur_post; +} + + int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, struct ib_send_wr **bad_wr) { @@ -1996,7 +2047,6 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, int uninitialized_var(size); void *qend = qp-sq.qend; unsigned long flags; - u32 mlx5_opcode; unsigned idx; int err = 0; int inl = 0; @@ -2005,7 +2055,6 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, int nreq; int i; u8 next_fence = 0; - u8 opmod = 0; u8 fence; spin_lock_irqsave(qp-sq.lock, flags); @@ -2018,36 +2067,23 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, goto out; } - if (unlikely(mlx5_wq_overflow(qp-sq, nreq, qp-ibqp.send_cq))) { + fence = qp-fm_cache; + num_sge = wr-num_sge; + if (unlikely(num_sge qp-sq.max_gs)) { mlx5_ib_warn(dev, \n); err = -ENOMEM; *bad_wr = wr; goto out; } - fence = qp-fm_cache; - num_sge = wr-num_sge; - if (unlikely(num_sge qp-sq.max_gs)) { + err = begin_wqe(qp, seg, ctrl, wr, idx, size, nreq); + if (err) { mlx5_ib_warn(dev, \n); err = -ENOMEM; *bad_wr = wr; goto out; } - idx = qp-sq.cur_post (qp-sq.wqe_cnt - 1); - seg = mlx5_get_send_wqe(qp, idx); - ctrl = seg; - *(uint32_t *)(seg + 8) = 0; - ctrl-imm = send_ieth(wr); - ctrl-fm_ce_se = qp-sq_signal_bits | - (wr-send_flags IB_SEND_SIGNALED ? -MLX5_WQE_CTRL_CQ_UPDATE : 0) | - (wr-send_flags IB_SEND_SOLICITED ? -MLX5_WQE_CTRL_SOLICITED : 0); - - seg += sizeof(*ctrl); - size = sizeof(*ctrl) / 16; - switch (ibqp-qp_type) { case IB_QPT_XRC_INI: xrc = seg; @@ -2197,22 +2233,9 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, } } - mlx5_opcode = mlx5_ib_opcode[wr-opcode]; - ctrl-opmod_idx_opcode
[PATCH RFC v2 07/10] IB/mlx5: Keep mlx5 MRs in a radix tree under device
This will be useful when processing signature errors on a specific key. The mlx5 driver will lookup the matching mlx5 memory region structure and mark it as dirty (contains signature errors). Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/net/ethernet/mellanox/mlx5/core/main.c |1 + drivers/net/ethernet/mellanox/mlx5/core/mr.c | 20 include/linux/mlx5/driver.h| 12 3 files changed, 33 insertions(+), 0 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index b47739b..5b7b3c7 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -428,6 +428,7 @@ int mlx5_dev_init(struct mlx5_core_dev *dev, struct pci_dev *pdev) mlx5_init_cq_table(dev); mlx5_init_qp_table(dev); mlx5_init_srq_table(dev); + mlx5_init_mr_table(dev); return 0; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mr.c b/drivers/net/ethernet/mellanox/mlx5/core/mr.c index 2ade604..f72e0b6 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/mr.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/mr.c @@ -36,9 +36,18 @@ #include linux/mlx5/cmd.h #include mlx5_core.h +void mlx5_init_mr_table(struct mlx5_core_dev *dev) +{ + struct mlx5_mr_table *table = dev-priv.mr_table; + + rwlock_init(table-lock); + INIT_RADIX_TREE(table-tree, GFP_ATOMIC); +} + int mlx5_core_create_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr, struct mlx5_create_mkey_mbox_in *in, int inlen) { + struct mlx5_mr_table *table = dev-priv.mr_table; struct mlx5_create_mkey_mbox_out out; int err; u8 key; @@ -63,14 +72,21 @@ int mlx5_core_create_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr, mr-key = mlx5_idx_to_mkey(be32_to_cpu(out.mkey) 0xff) | key; mlx5_core_dbg(dev, out 0x%x, key 0x%x, mkey 0x%x\n, be32_to_cpu(out.mkey), key, mr-key); + /* connect to MR tree */ + write_lock_irq(table-lock); + err = radix_tree_insert(table-tree, mr-key 0xff00, mr); + write_unlock_irq(table-lock); + return err; } EXPORT_SYMBOL(mlx5_core_create_mkey); int mlx5_core_destroy_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr) { + struct mlx5_mr_table *table = dev-priv.mr_table; struct mlx5_destroy_mkey_mbox_in in; struct mlx5_destroy_mkey_mbox_out out; + unsigned long flags; int err; memset(in, 0, sizeof(in)); @@ -85,6 +101,10 @@ int mlx5_core_destroy_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr) if (out.hdr.status) return mlx5_cmd_status_to_err(out.hdr); + write_lock_irqsave(table-lock, flags); + radix_tree_delete(table-tree, mr-key 0xff00); + write_unlock_irqrestore(table-lock, flags); + return err; } EXPORT_SYMBOL(mlx5_core_destroy_mkey); diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 7c33487..5fe0690 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -488,6 +488,13 @@ struct mlx5_srq_table { struct radix_tree_root tree; }; +struct mlx5_mr_table { + /* protect radix tree +*/ + rwlock_tlock; + struct radix_tree_root tree; +}; + struct mlx5_priv { charname[MLX5_MAX_NAME_LEN]; struct mlx5_eq_tableeq_table; @@ -516,6 +523,10 @@ struct mlx5_priv { struct mlx5_cq_tablecq_table; /* end: cq staff */ + /* start: mr staff */ + struct mlx5_mr_tablemr_table; + /* end: mr staff */ + /* start: alloc staff */ struct mutexpgdir_mutex; struct list_headpgdir_list; @@ -691,6 +702,7 @@ int mlx5_core_query_srq(struct mlx5_core_dev *dev, struct mlx5_core_srq *srq, struct mlx5_query_srq_mbox_out *out); int mlx5_core_arm_srq(struct mlx5_core_dev *dev, struct mlx5_core_srq *srq, u16 lwm, int is_srq); +void mlx5_init_mr_table(struct mlx5_core_dev *dev); int mlx5_core_create_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr, struct mlx5_create_mkey_mbox_in *in, int inlen); int mlx5_core_destroy_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC v2 00/10] Introduce Signature feature
This patchset Introduces Verbs level support for signature handover feature. Siganture is intended to implement end-to-end data integrity on a transactional basis in a completely offloaded manner. There are several end-to-end data integrity methods used today in various applications and/or upper layer protocols such as T10-DIF defined by SCSI specifications (SBC), CRC32, XOR8 and more. This patchset adds verbs support only for T10-DIF. The proposed framework allows adding more signature methods in the future. In T10-DIF, when a series of 512-byte data blocks are transferred, each block is followed by an 8-byte guard. The guard consists of CRC that protects the integrity of the data in the block, and some other tags that protects against mis-directed IOs. Data can be protected when transferred over the wire, but can also be protected in the memory of the sender/receiver. This allows true end- to-end protection against bits flipping either over the wire, through gateways, in memory, over PCI, etc. While T10-DIF clearly defines that over the wire protection guards are interleaved into the data stream (each 512-Byte block followed by 8-byte guard), when in memory, the protection guards may reside in a buffer separated from the data. Depending on the application, it is usually easier to handle the data when it is contiguous. In this case the data buffer will be of size 512xN and the protection buffer will be of size 8xN (where N is the number of blocks in the transaction). There are 3 kinds of signature handover operation: 1. Take unprotected data (from wire or memory) and ADD protection guards. 2. Take protetected data (from wire or memory), validate the data integrity against the protection guards and STRIP the protection guards. 3. Take protected data (from wire or memory), validate the data integrity against the protection guards and PASS the data with the guards as-is. This translates to defining to the HCA how/if data protection exists in memory domain, and how/if data protection exists is wire domain. The way that data integrity is performed is by using a new kind of memory region: signature-enabled MR, and a new kind of work request: REG_SIG_MR. The REG_SIG_MR WR operates on the signature-enabled MR, and defines all the needed information for the signature handover (data buffer, protection buffer if needed and signature attributes). The result is an MR that can be used for data transfer as usual, that will also add/validate/strip/pass protection guards. When the data transfer is successfully completed, it does not mean that there are no integrity errors. The user must afterwards check the signature status of the handover operation using a new light-weight verb. This feature shall be used in storage upper layer protocols iSER/SRP implementing end-to-end data integrity T10-DIF. Following this patchset, we will soon submit krping patches which will demonstrate the usage of these signature verbs. Patchset summary: - Intoduce verbs for create/destroy memory regions supporting signature. - Introduce IB core signature verbs API. - Implement mr create/destroy verbs in mlx5 driver. - Preperation patches for signature support in mlx5 driver. - Implement signature handover work request in mlx5 driver. - Implement signature error collection and handling in mlx5 driver. Changes from v1: - IB/core: Reduced sizeof ib_send_wr by using wr-sg_list for data and dedicated ib_sge for protection guards buffer. Currently sig_handover extension does not increase sizeof ib_send_wr - IB/core: Change enum to int for container variables. - IB/mlx5: Validate wr-num_sge=1 for REG_SIG_MR work request. Changes from v0: - Commit messages: Added more detailed explanation for signature work request. - IB/core: Remove indirect memory registration enablement from create_mr. Keep only signature enablement. - IB/mlx5: Changed signature error processing via MR radix lookup. Sagi Grimberg (10): IB/core: Introduce protected memory regions IB/core: Introduce Signature Verbs API IB/mlx5, mlx5_core: Support for create_mr and destroy_mr IB/mlx5: Initialize mlx5_ib_qp signature related IB/mlx5: Break wqe handling to begin finish routines IB/mlx5: remove MTT access mode from umr flags helper function IB/mlx5: Keep mlx5 MRs in a radix tree under device IB/mlx5: Support IB_WR_REG_SIG_MR IB/mlx5: Collect signature error completion IB/mlx5: Publish support in signature feature drivers/infiniband/core/verbs.c| 47 +++ drivers/infiniband/hw/mlx5/cq.c| 53 +++ drivers/infiniband/hw/mlx5/main.c | 12 + drivers/infiniband/hw/mlx5/mlx5_ib.h | 14 + drivers/infiniband/hw/mlx5/mr.c| 138 +++ drivers/infiniband/hw/mlx5/qp.c| 525 ++-- drivers/net/ethernet/mellanox/mlx5/core/main.c |1 + drivers/net/ethernet/mellanox/mlx5/core/mr.c | 84 include
[PATCH RFC v2 03/10] IB/mlx5, mlx5_core: Support for create_mr and destroy_mr
Support create_mr and destroy_mr verbs. Creating ib_mr may be done for either ib_mr that will register regular page lists like alloc_fast_reg_mr routine, or indirect ib_mr's that can register other (pre-registered) ib_mr's in an indirect manner. In addition user may request signature enable, that will mean that the created ib_mr may be attached with signature attributes (BSF, PSVs). Currently we only allow direct/indirect registration modes. Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/hw/mlx5/main.c|2 + drivers/infiniband/hw/mlx5/mlx5_ib.h |4 + drivers/infiniband/hw/mlx5/mr.c | 109 ++ drivers/net/ethernet/mellanox/mlx5/core/mr.c | 64 +++ include/linux/mlx5/device.h | 25 ++ include/linux/mlx5/driver.h | 19 + 6 files changed, 223 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index 3f831de..2e67a37 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -1401,9 +1401,11 @@ static int init_one(struct pci_dev *pdev, dev-ib_dev.get_dma_mr = mlx5_ib_get_dma_mr; dev-ib_dev.reg_user_mr = mlx5_ib_reg_user_mr; dev-ib_dev.dereg_mr= mlx5_ib_dereg_mr; + dev-ib_dev.destroy_mr = mlx5_ib_destroy_mr; dev-ib_dev.attach_mcast= mlx5_ib_mcg_attach; dev-ib_dev.detach_mcast= mlx5_ib_mcg_detach; dev-ib_dev.process_mad = mlx5_ib_process_mad; + dev-ib_dev.create_mr = mlx5_ib_create_mr; dev-ib_dev.alloc_fast_reg_mr = mlx5_ib_alloc_fast_reg_mr; dev-ib_dev.alloc_fast_reg_page_list = mlx5_ib_alloc_fast_reg_page_list; dev-ib_dev.free_fast_reg_page_list = mlx5_ib_free_fast_reg_page_list; diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index 836be91..45d7424 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -262,6 +262,7 @@ struct mlx5_ib_mr { int npages; struct completion done; enum ib_wc_status status; + struct mlx5_core_sig_ctx*sig; }; struct mlx5_ib_fast_reg_page_list { @@ -489,6 +490,9 @@ struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, u64 virt_addr, int access_flags, struct ib_udata *udata); int mlx5_ib_dereg_mr(struct ib_mr *ibmr); +int mlx5_ib_destroy_mr(struct ib_mr *ibmr); +struct ib_mr *mlx5_ib_create_mr(struct ib_pd *pd, + struct ib_mr_init_attr *mr_init_attr); struct ib_mr *mlx5_ib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len); struct ib_fast_reg_page_list *mlx5_ib_alloc_fast_reg_page_list(struct ib_device *ibdev, diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c index bd41df9..44f7e46 100644 --- a/drivers/infiniband/hw/mlx5/mr.c +++ b/drivers/infiniband/hw/mlx5/mr.c @@ -921,6 +921,115 @@ int mlx5_ib_dereg_mr(struct ib_mr *ibmr) return 0; } +struct ib_mr *mlx5_ib_create_mr(struct ib_pd *pd, + struct ib_mr_init_attr *mr_init_attr) +{ + struct mlx5_ib_dev *dev = to_mdev(pd-device); + struct mlx5_create_mkey_mbox_in *in; + struct mlx5_ib_mr *mr; + int access_mode, err; + int ndescs = roundup(mr_init_attr-max_reg_descriptors, 4); + + mr = kzalloc(sizeof(*mr), GFP_KERNEL); + if (!mr) + return ERR_PTR(-ENOMEM); + + in = kzalloc(sizeof(*in), GFP_KERNEL); + if (!in) { + err = -ENOMEM; + goto err_free; + } + + in-seg.status = 1 6; /* free */ + in-seg.xlt_oct_size = cpu_to_be32(ndescs); + in-seg.qpn_mkey7_0 = cpu_to_be32(0xff 8); + in-seg.flags_pd = cpu_to_be32(to_mpd(pd)-pdn); + access_mode = MLX5_ACCESS_MODE_MTT; + + if (mr_init_attr-flags IB_MR_SIGNATURE_EN) { + u32 psv_index[2]; + + in-seg.flags_pd = cpu_to_be32(be32_to_cpu(in-seg.flags_pd) | + MLX5_MKEY_BSF_EN); + in-seg.bsfs_octo_size = cpu_to_be32(MLX5_MKEY_BSF_OCTO_SIZE); + mr-sig = kzalloc(sizeof(*mr-sig), GFP_KERNEL); + if (!mr-sig) { + err = -ENOMEM; + goto err_free; + } + + /* create mem wire PSVs */ + err = mlx5_core_create_psv(dev-mdev, to_mpd(pd)-pdn, + 2, psv_index); + if (err) + goto err_free_sig; + + access_mode = MLX5_ACCESS_MODE_KLM; + mr-sig-psv_memory.psv_idx = psv_index[0
Re: [PATCH RFC v2 03/10] IB/mlx5, mlx5_core: Support for create_mr and destroy_mr
On 10/31/2013 2:52 PM, Jack Wang wrote: On 10/31/2013 01:24 PM, Sagi Grimberg wrote: Support create_mr and destroy_mr verbs. Creating ib_mr may be done for either ib_mr that will register regular page lists like alloc_fast_reg_mr routine, or indirect ib_mr's that can register other (pre-registered) ib_mr's in an indirect manner. In addition user may request signature enable, that will mean that the created ib_mr may be attached with signature attributes (BSF, PSVs). Currently we only allow direct/indirect registration modes. Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/hw/mlx5/main.c|2 + drivers/infiniband/hw/mlx5/mlx5_ib.h |4 + drivers/infiniband/hw/mlx5/mr.c | 109 ++ drivers/net/ethernet/mellanox/mlx5/core/mr.c | 64 +++ include/linux/mlx5/device.h | 25 ++ include/linux/mlx5/driver.h | 19 + 6 files changed, 223 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index 3f831de..2e67a37 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -1401,9 +1401,11 @@ static int init_one(struct pci_dev *pdev, dev-ib_dev.get_dma_mr = mlx5_ib_get_dma_mr; dev-ib_dev.reg_user_mr = mlx5_ib_reg_user_mr; dev-ib_dev.dereg_mr = mlx5_ib_dereg_mr; + dev-ib_dev.destroy_mr = mlx5_ib_destroy_mr; dev-ib_dev.attach_mcast = mlx5_ib_mcg_attach; dev-ib_dev.detach_mcast = mlx5_ib_mcg_detach; dev-ib_dev.process_mad = mlx5_ib_process_mad; + dev-ib_dev.create_mr= mlx5_ib_create_mr; dev-ib_dev.alloc_fast_reg_mr= mlx5_ib_alloc_fast_reg_mr; dev-ib_dev.alloc_fast_reg_page_list = mlx5_ib_alloc_fast_reg_page_list; dev-ib_dev.free_fast_reg_page_list = mlx5_ib_free_fast_reg_page_list; diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index 836be91..45d7424 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -262,6 +262,7 @@ struct mlx5_ib_mr { int npages; struct completion done; enum ib_wc_status status; + struct mlx5_core_sig_ctx*sig; }; struct mlx5_ib_fast_reg_page_list { @@ -489,6 +490,9 @@ struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, u64 virt_addr, int access_flags, struct ib_udata *udata); int mlx5_ib_dereg_mr(struct ib_mr *ibmr); +int mlx5_ib_destroy_mr(struct ib_mr *ibmr); +struct ib_mr *mlx5_ib_create_mr(struct ib_pd *pd, + struct ib_mr_init_attr *mr_init_attr); struct ib_mr *mlx5_ib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len); struct ib_fast_reg_page_list *mlx5_ib_alloc_fast_reg_page_list(struct ib_device *ibdev, diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c index bd41df9..44f7e46 100644 --- a/drivers/infiniband/hw/mlx5/mr.c +++ b/drivers/infiniband/hw/mlx5/mr.c @@ -921,6 +921,115 @@ int mlx5_ib_dereg_mr(struct ib_mr *ibmr) return 0; } +struct ib_mr *mlx5_ib_create_mr(struct ib_pd *pd, + struct ib_mr_init_attr *mr_init_attr) +{ + struct mlx5_ib_dev *dev = to_mdev(pd-device); + struct mlx5_create_mkey_mbox_in *in; + struct mlx5_ib_mr *mr; + int access_mode, err; + int ndescs = roundup(mr_init_attr-max_reg_descriptors, 4); + + mr = kzalloc(sizeof(*mr), GFP_KERNEL); + if (!mr) + return ERR_PTR(-ENOMEM); + + in = kzalloc(sizeof(*in), GFP_KERNEL); + if (!in) { + err = -ENOMEM; + goto err_free; + } + + in-seg.status = 1 6; /* free */ + in-seg.xlt_oct_size = cpu_to_be32(ndescs); + in-seg.qpn_mkey7_0 = cpu_to_be32(0xff 8); + in-seg.flags_pd = cpu_to_be32(to_mpd(pd)-pdn); + access_mode = MLX5_ACCESS_MODE_MTT; + + if (mr_init_attr-flags IB_MR_SIGNATURE_EN) { + u32 psv_index[2]; + + in-seg.flags_pd = cpu_to_be32(be32_to_cpu(in-seg.flags_pd) | + MLX5_MKEY_BSF_EN); + in-seg.bsfs_octo_size = cpu_to_be32(MLX5_MKEY_BSF_OCTO_SIZE); + mr-sig = kzalloc(sizeof(*mr-sig), GFP_KERNEL); + if (!mr-sig) { + err = -ENOMEM; + goto err_free; + } + + /* create mem wire PSVs */ + err = mlx5_core_create_psv(dev-mdev, to_mpd(pd)-pdn, + 2, psv_index); + if (err) + goto err_free_sig
Re: [PATCH RFC v2 00/10] Introduce Signature feature
On 10/31/2013 2:55 PM, Jack Wang wrote: Hi Sagi, I wander what's the performance overhead with this DIF support? And is there a roadmap for support SRP/ISER and target side for DIF? Regards, Jack Well, all DIF operations are fully offloaded by the HCA so we don't expect any performance degradation other than the obvious 8-bytes integrity overhead. We have yet to take benchmarks on this and we definitely plan to do so. Regarding our roadmap, we plan to support iSER target (LIO) and initiator first. Some prior support for DIF needs to be added in target core level, then transport implementation is pretty straight-forward (iSER/SRP). So I aim for iSER DIF support (target+initiator) to make it into v3.14. Hope this helps, Sagi. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC v2 00/10] Introduce Signature feature
On 11/2/2013 12:06 AM, Bart Van Assche wrote: On 31/10/2013 5:24, Sagi Grimberg wrote: While T10-DIF clearly defines that over the wire protection guards are interleaved into the data stream (each 512-Byte block followed by 8-byte guard), when in memory, the protection guards may reside in a buffer separated from the data. Depending on the application, it is usually easier to handle the data when it is contiguous. In this case the data buffer will be of size 512xN and the protection buffer will be of size 8xN (where N is the number of blocks in the transaction). It might be worth mentioning here that in the Linux block layer the approach has been chosen where actual data an protection information are in separate buffers. See also the bi_integrity field in struct bio. Bart. Hey Bart, I was expecting your input on this Thanks for the insightful comments! The explanation here is an attempt to Introduce T10-DIF to the mailing-list as simple as possible, so I tried not to dive into SBC-3/SPC-4. You are correct, the 8-byte protection guards will follow the protection interval which won't necessarily be 512 (only for DIF types 2,3). Sagi. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC v2 00/10] Introduce Signature feature
On 11/2/2013 12:06 AM, Bart Van Assche wrote: On 31/10/2013 5:24, Sagi Grimberg wrote: While T10-DIF clearly defines that over the wire protection guards are interleaved into the data stream (each 512-Byte block followed by 8-byte guard), when in memory, the protection guards may reside in a buffer separated from the data. Depending on the application, it is usually easier to handle the data when it is contiguous. In this case the data buffer will be of size 512xN and the protection buffer will be of size 8xN (where N is the number of blocks in the transaction). It might be worth mentioning here that in the Linux block layer the approach has been chosen where actual data an protection information are in separate buffers. See also the bi_integrity field in struct bio. Bart. This is true, but signature verbs interface supports also data and protection interleaving in memory space. A user wishes to do so will pass the same ib_sge both for data and protection. In fact this was a requirement we got from customers. Sagi. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC v2 02/10] IB/core: Introduce Signature Verbs API
On 11/1/2013 5:13 PM, Bart Van Assche wrote: On 31/10/2013 5:24, Sagi Grimberg wrote: +/** + * struct ib_sig_domain - Parameters specific for T10-DIF + * domain. + * @sig_type: specific signauture type + * @sig: union of all signature domain attributes that may + * be used to set domain layout. + * @dif: + * @type: T10-DIF type (0|1|2|3) + * @bg_type: T10-DIF block guard type (CRC|CSUM) + * @block_size: block size in signature domain. + * @app_tag: if app_tag is owned be the user, + * HCA will take this value to be app_tag. + * @ref_tag: initial ref_tag of signature handover. + * @type3_inc_reftag: T10-DIF type 3 does not state + *about the reference tag, it is the user + *choice to increment it or not. + */ +struct ib_sig_domain { +enum ib_signature_type sig_type; +union { +struct { +enum ib_t10_dif_typetype; +enum ib_t10_dif_bg_type bg_type; +u16block_size; +u16bg; +u16app_tag; +u32ref_tag; +booltype3_inc_reftag; +} dif; +} sig; +}; My understanding from SPC-4 is that in that when using protection information such information is inserted after every protection interval. A protection interval can be smaller than a logical block. Shouldn't the name block_size be changed into something like pi_interval to avoid confusion with the logical block size ? Bart. True, for DIF types 2,3 protection interval is not restricted to be logical block length and may be smaller. I agree with pi_interval naming. Note that pi_intervals smaller than 512 bytes are not supported at the moment. Sagi. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC v2 01/10] IB/core: Introduce protected memory regions
On 11/1/2013 7:09 PM, Bart Van Assche wrote: On 31/10/2013 5:24, Sagi Grimberg wrote: +/** + * ib_mr_init_attr - Memory region init attributes passed to routine + *ib_create_mr. + * @max_reg_descriptors: max number of registration units that + * may be used with UMR work requests. + * @flags: MR creation flags bit mask. + */ +struct ib_mr_init_attr { +intmax_reg_descriptors; +intflags; +}; Is this the first patch that add the abbreviation UMR to a header file in include/rdma ? If so, I think it's a good idea not only to mention the abbreviation but also what UMR stands for. Bart. You are correct, I prefer to remove this abbreviation UMR as it is not tightly related to signature. The the max_reg_descriptors parameter is the equivalent to max_page_list_len of ib_alloc_fast_reg_mr(). The difference is that this memory region can also register indirect memory descriptors {key, addr, len} rather than u64 physical addresses. For example signature enabled memory region may register 2 descriptors: data and protection. I'll modify the explanation here in v3. Sagi. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC v2 02/10] IB/core: Introduce Signature Verbs API
On 11/2/2013 12:23 AM, Bart Van Assche wrote: On 31/10/2013 5:24, Sagi Grimberg wrote: + * @type3_inc_reftag: T10-DIF type 3 does not state + *about the reference tag, it is the user + *choice to increment it or not. Can you explain this further ? Does this mean that the HCA can check whether the reference tags are increasing when receiving data for TYPE 3 protection mode ? My understanding of SPC-4 is that the application is free to use the reference tag in any way when using TYPE 3 protection and hence that the HCA must not check whether the reference tag is increasing for TYPE 3 protection. See e.g. sd_dif_type3_get_tag() in drivers/scsi/sd_dif.c. Bart. As I understand TYPE 3, the reference tag is free for the application to use - which may choose to inc it each PI or not. This option allows the application to inc ref_tag in type 3. The DIF check is determined via check_mask. As I see it, correct use in case of DIF TYPE 3 is not to validate reference tag i.e. set REF_TAG bits in check_mask to zero. Sagi. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC v2 02/10] IB/core: Introduce Signature Verbs API
On 11/1/2013 8:46 PM, Bart Van Assche wrote: On 31/10/2013 5:24, Sagi Grimberg wrote: +/** + * Signature T10-DIF block-guard types + */ +enum ib_t10_dif_bg_type { +IB_T10DIF_CRC, +IB_T10DIF_CSUM +}; In SPC-4 paragraph 4.22.4 I found that the T10-PI guard is the CRC computed from the generator polynomial x^16 + x^15 + x^11 + x^9 + x^8 + x^7 + x^5 + x^4 + x^2 + x + 1. Could you tell me where I can find which guard computation method IB_T10DIF_CSUM corresponds to ? Bart. The IB_T10DIF_CSUM computation method corresponds to IP checksum rules. this is aligned with SHOST_DIX_GUARD_IP guard type. Sagi. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC v2 08/10] IB/mlx5: Support IB_WR_REG_SIG_MR
On 11/1/2013 5:05 PM, Bart Van Assche wrote: On 31/10/2013 5:24, Sagi Grimberg wrote: +static u8 bs_selector(int block_size) +{ +switch (block_size) { +case 512:return 0x1; +case 520:return 0x2; +case 4096:return 0x3; +case 4160:return 0x4; +case 1073741824:return 0x5; +default:return 0; +} +} Would it be possible to provide some more information about how the five supported block sizes have been chosen ? Thanks, Bart. These block_sizes were chosen from our costumers who were interested in signature. This is the current HCA support for the time being. Sagi. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC v2 08/10] IB/mlx5: Support IB_WR_REG_SIG_MR
On 11/2/2013 11:59 PM, Bart Van Assche wrote: On 2/11/2013 12:21, Or Gerlitz wrote: On Fri, Nov 1, 2013 at 10:37 PM, Bart Van Assche bvanass...@acm.org wrote: On 31/10/2013 5:24, Sagi Grimberg wrote: This patch implements IB_WR_REG_SIG_MR posted by the user. Baisically this WR involvs 3 WQEs in order to prepare and properly register the signature layout: 1. post UMR WR to register the sig_mr in one of two possible ways: * In case the user registered a single MR for data so the UMR data segment consists of: - single klm (data MR) passed by the user - BSF with signature attributes requested by the user. * In case the user registered 2 MRs, one for data and one for protection, the UMR consists of: - strided block format which includes data and protection MRs and their repetitive block format. - BSF with signature attributes requested by the user. 2. post SET_PSV in order to set the for the memory domain initial signature parameters passed by the user. 3. post SET_PSV in order to set the for the wire domain initial signature parameters passed by the user. This patch also introduces some helper functions to set the BSF correctly and determining the signature format selectors. Has it already been explained somewhere what the abbreviations KLM, BSF and PSV stand for ? Bart, these are all HW T10 related objects/concepts, we made an effort to keep them contained within the mlx5 driver such that they don't show up on the IB core layer. If this helps for the review, Sagi can spare few words on each, sure. Hello Or, I would certainly appreciate it if these abbreviations could be clarified further. That would allow me to understand what has been explained in the above patch description :-) Bart. Hey Bart, As Or said, these concepts are vendor specific and not exposed to IB core layer And their naming is also pure Mellanox. This is also might change in future generation HCAs. In general the sig_mr (signature enabled) is a memory region that can register other memory regions (hint: data MR and protection MR) and is attached to (mlx5) signature objects. KLM: A tuple {key, addr, len} that is used for indirect registration. BSF: this is the object that describes the wire and memory layouts. we call it a byte-stream format. PSV: this is the signature variable that is computing the guards - used for generation and/or validation. exists for each domain. So We constructed REG_SIG_MR operation as a 3-way operation: - Online registration for sig_mr: Register in an indirect manner for data and protection (if exists). If no protection exists in memory domain the sig_mr registers the data buffer (KLM). If protection exists in memory domain (DIX) the sig_mr registers data and protections buffers (KLMs) In the DIX case, order to transfer DIF every pi_interval the registration also defines the strided format of the execution (pi_interval of data followed by 8byte of protection in a repetitive manner). - Define signature format of wire/memory domains (BSF object) tell the HW how to treat the signature layout in each domain (signature type, pi_interval etc...) - Set the signature variables for each domain (memory, wire) Here we place the seeds that the HW starts signature computation (In the DIF case, Initial CRC, Initial ref_tag, initial app_tag). Sagi. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC v2 02/10] IB/core: Introduce Signature Verbs API
On 11/3/2013 4:41 PM, Bart Van Assche wrote: On 3/11/2013 4:15, Sagi Grimberg wrote: On 11/1/2013 8:46 PM, Bart Van Assche wrote: On 31/10/2013 5:24, Sagi Grimberg wrote: +/** + * Signature T10-DIF block-guard types + */ +enum ib_t10_dif_bg_type { +IB_T10DIF_CRC, +IB_T10DIF_CSUM +}; In SPC-4 paragraph 4.22.4 I found that the T10-PI guard is the CRC computed from the generator polynomial x^16 + x^15 + x^11 + x^9 + x^8 + x^7 + x^5 + x^4 + x^2 + x + 1. Could you tell me where I can find which guard computation method IB_T10DIF_CSUM corresponds to ? Bart. The IB_T10DIF_CSUM computation method corresponds to IP checksum rules. this is aligned with SHOST_DIX_GUARD_IP guard type. Since the declarations added in rdma/ib_verbs.h constitute an interface definition I think it would help if it would be made more clear what these two symbols stand for. How about mentioning the names of the standards these two guard computation methods come from ? An alternative is to add a comment like the one above scsi_host_guard_type in scsi/scsi_host.h which explains the two guard computation methods well: /* * All DIX-capable initiators must support the T10-mandated CRC * checksum. Controllers can optionally implement the IP checksum * scheme which has much lower impact on system performance. Note * that the main rationale for the checksum is to match integrity * metadata with data. Detecting bit errors are a job for ECC memory * and buses. */ Bart. Agreed, I'll comment on each type correspondence (T10-DIF CRC checksum and IP checksum). Sagi. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC v2 00/10] Introduce Signature feature
On 11/4/2013 8:41 PM, Nicholas A. Bellinger wrote: On Sat, 2013-11-02 at 14:57 -0700, Bart Van Assche wrote: On 1/11/2013 18:36, Nicholas A. Bellinger wrote: On Fri, 2013-11-01 at 08:03 -0700, Bart Van Assche wrote: On 31/10/2013 5:24, Sagi Grimberg wrote: In T10-DIF, when a series of 512-byte data blocks are transferred, each block is followed by an 8-byte guard. The guard consists of CRC that protects the integrity of the data in the block, and some other tags that protects against mis-directed IOs. Shouldn't that read logical block length divided by 2**(protection interval exponent) instead of 512 ? From the SPC-4 FORMAT UNIT section: Why should the protection interval in FORMAT_UNIT be mentioned when it's not supported by the hardware, nor by drivers/scsi/sd_dif.c itself..? Hello Nick, My understanding is that this patch series is not only intended for initiator drivers but also for target drivers like ib_srpt and ib_isert. As you know target drivers do not restrict the initiator operating system to Linux. Although I do not know whether there are already operating systems that support the protection interval exponent, It's my understanding that Linux is still the only stack that supports DIF, so AFAICT no one is actually supporting this. I think it is a good idea to stay as close as possible to the terminology of the SPC-4 standard. No, in this context it only adds pointless misdirection because 1) The hardware in question doesn't support it, and 2) Linux itself doesn't support it. I think that Bart is suggesting renaming block_size as pi_interval in ib_sig_domain. I tend to agree since even if support for that does not exist yet, it might be in the future. I think it is not a misdirection because it does represent the protection information interval. --nab -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 02/10] IB/core: Introduce Signature Verbs API
This commit Introduces the Verbs Interface for signature related operations. A signature handover operation shall configure the layouts of data and protection attributes both in memory and wire domains. Signature operations are: - INSERT Generate and insert protection information when handing over data from input space to output space. - vaildate and STRIP: Validate protection information and remove it when handing over data from input space to output space. - validate and PASS: Validate protection information and pass it when handing over data from input space to output space. Once the signature handover opration is done, the HCA will offload data integrity generation/validation while performing the actual data transfer. Additions: 1. HCA signature capabilities in device attributes Verbs provider supporting Signature handover operations shall fill relevant fields in device attributes structure returned by ib_query_device. 2. QP creation flag IB_QP_CREATE_SIGNATURE_EN Creating QP that will carry signature handover operations may require some special preperations from the verbs provider. So we add QP creation flag IB_QP_CREATE_SIGNATURE_EN to declare that the created QP may carry out signature handover operations. Expose signature support to verbs layer (no support for now) 3. New send work request IB_WR_REG_SIG_MR Signature handover work request. This WR will define the signature handover properties of the memory/wire domains as well as the domains layout. The purpose of this work request is to bind all the needed information for the signature operation: - data to be transferred: wr-sg_list (ib_sge). * The raw data, pre-registered to a single MR (normally, before signature, this MR would have been used directly for the data transfer) - data protection guards: sig_handover.prot (ib_sge). * The data protection buffer, pre-registered to a single MR, which contains the data integrity guards of the raw data blocks. Note that it may not always exist, only in cases where the user is interested in storing protection guards in memory. - signature operation attributes: sig_handover.sig_attrs. * Tells the HCA how to validate/generate the protection information. Once the work request is executed, the memory region which will describe the signature transaction will be the sig_mr. The application can now go ahead and send the sig_mr.rkey or use the sig_mr.lkey for data transfer. 4. New Verb ib_check_sig_status check_sig_status Verb shall check if any signature errors are pending for a specific signature-enabled ib_mr. This Verb is a lightwight check and is allowed to be taken from interrupt context. Application must call this verb after it is known that the actual data transfer has finished. issue: 333508 Change-Id: I0cce750a6b77cd1eae102c5982c8c31e46237af8 Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/core/verbs.c |8 +++ include/rdma/ib_verbs.h | 132 ++- 2 files changed, 139 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index ef47667..d3d2ce5 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -1323,3 +1323,11 @@ int ib_destroy_flow(struct ib_flow *flow_id) return err; } EXPORT_SYMBOL(ib_destroy_flow); + +int ib_check_sig_status(struct ib_mr *sig_mr, + struct ib_sig_err *sig_err) +{ + return sig_mr-device-check_sig_status ? + sig_mr-device-check_sig_status(sig_mr, sig_err) : -ENOSYS; +} +EXPORT_SYMBOL(ib_check_sig_status); diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index af1bd1a..e71dae6 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -117,7 +117,19 @@ enum ib_device_cap_flags { IB_DEVICE_BLOCK_MULTICAST_LOOPBACK = (122), IB_DEVICE_MEM_WINDOW_TYPE_2A= (123), IB_DEVICE_MEM_WINDOW_TYPE_2B= (124), - IB_DEVICE_MANAGED_FLOW_STEERING = (129) + IB_DEVICE_MANAGED_FLOW_STEERING = (129), + IB_DEVICE_SIGNATURE_HANDOVER= (130) +}; + +enum ib_signature_prot_cap { + IB_PROT_T10DIF_TYPE_1 = 1, + IB_PROT_T10DIF_TYPE_2 = 1 1, + IB_PROT_T10DIF_TYPE_3 = 1 2, +}; + +enum ib_signature_guard_cap { + IB_GUARD_T10DIF_CRC = 1, + IB_GUARD_T10DIF_CSUM= 1 1, }; enum ib_atomic_cap { @@ -167,6 +179,8 @@ struct ib_device_attr { unsigned intmax_fast_reg_page_list_len; u16 max_pkeys; u8 local_ca_ack_delay; + int sig_prot_cap; + int sig_guard_cap; }; enum ib_mtu { @@ -471,6 +485,98 @@ struct ib_mr_init_attr { u32 flags; }; +enum ib_signature_type
[PATCH v3 05/10] IB/mlx5: Break wqe handling to begin finish routines
As a preliminary step for signature feature which will reuqire posting multiple (3) WQEs for a single WR, we break post_send routine WQE indexing into begin and finish routines. This patch does not change any functionality. issue: 333508 Change-Id: If373dff9a21ead58117137409e81143f94aa3fec Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/hw/mlx5/qp.c | 97 -- 1 files changed, 61 insertions(+), 36 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index f61e93c..15df91b 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -1992,6 +1992,59 @@ static u8 get_fence(u8 fence, struct ib_send_wr *wr) } } +static int begin_wqe(struct mlx5_ib_qp *qp, void **seg, +struct mlx5_wqe_ctrl_seg **ctrl, +struct ib_send_wr *wr, int *idx, +int *size, int nreq) +{ + int err = 0; + + if (unlikely(mlx5_wq_overflow(qp-sq, nreq, qp-ibqp.send_cq))) { + err = -ENOMEM; + return err; + } + + *idx = qp-sq.cur_post (qp-sq.wqe_cnt - 1); + *seg = mlx5_get_send_wqe(qp, *idx); + *ctrl = *seg; + *(uint32_t *)(*seg + 8) = 0; + (*ctrl)-imm = send_ieth(wr); + (*ctrl)-fm_ce_se = qp-sq_signal_bits | + (wr-send_flags IB_SEND_SIGNALED ? +MLX5_WQE_CTRL_CQ_UPDATE : 0) | + (wr-send_flags IB_SEND_SOLICITED ? +MLX5_WQE_CTRL_SOLICITED : 0); + + *seg += sizeof(**ctrl); + *size = sizeof(**ctrl) / 16; + + return err; +} + +static void finish_wqe(struct mlx5_ib_qp *qp, + struct mlx5_wqe_ctrl_seg *ctrl, + u8 size, unsigned idx, u64 wr_id, + int *nreq, u8 fence, u8 next_fence, + u32 mlx5_opcode) +{ + u8 opmod = 0; + + ctrl-opmod_idx_opcode = cpu_to_be32(((u32)(qp-sq.cur_post) 8) | +mlx5_opcode | ((u32)opmod 24)); + ctrl-qpn_ds = cpu_to_be32(size | (qp-mqp.qpn 8)); + ctrl-fm_ce_se |= fence; + qp-fm_cache = next_fence; + if (unlikely(qp-wq_sig)) + ctrl-signature = wq_sig(ctrl); + + qp-sq.wrid[idx] = wr_id; + qp-sq.w_list[idx].opcode = mlx5_opcode; + qp-sq.wqe_head[idx] = qp-sq.head + (*nreq)++; + qp-sq.cur_post += DIV_ROUND_UP(size * 16, MLX5_SEND_WQE_BB); + qp-sq.w_list[idx].next = qp-sq.cur_post; +} + + int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, struct ib_send_wr **bad_wr) { @@ -2005,7 +2058,6 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, int uninitialized_var(size); void *qend = qp-sq.qend; unsigned long flags; - u32 mlx5_opcode; unsigned idx; int err = 0; int inl = 0; @@ -2014,7 +2066,6 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, int nreq; int i; u8 next_fence = 0; - u8 opmod = 0; u8 fence; spin_lock_irqsave(qp-sq.lock, flags); @@ -2027,36 +2078,23 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, goto out; } - if (unlikely(mlx5_wq_overflow(qp-sq, nreq, qp-ibqp.send_cq))) { + fence = qp-fm_cache; + num_sge = wr-num_sge; + if (unlikely(num_sge qp-sq.max_gs)) { mlx5_ib_warn(dev, \n); err = -ENOMEM; *bad_wr = wr; goto out; } - fence = qp-fm_cache; - num_sge = wr-num_sge; - if (unlikely(num_sge qp-sq.max_gs)) { + err = begin_wqe(qp, seg, ctrl, wr, idx, size, nreq); + if (err) { mlx5_ib_warn(dev, \n); err = -ENOMEM; *bad_wr = wr; goto out; } - idx = qp-sq.cur_post (qp-sq.wqe_cnt - 1); - seg = mlx5_get_send_wqe(qp, idx); - ctrl = seg; - *(uint32_t *)(seg + 8) = 0; - ctrl-imm = send_ieth(wr); - ctrl-fm_ce_se = qp-sq_signal_bits | - (wr-send_flags IB_SEND_SIGNALED ? -MLX5_WQE_CTRL_CQ_UPDATE : 0) | - (wr-send_flags IB_SEND_SOLICITED ? -MLX5_WQE_CTRL_SOLICITED : 0); - - seg += sizeof(*ctrl); - size = sizeof(*ctrl) / 16; - switch (ibqp-qp_type) { case IB_QPT_XRC_INI: xrc = seg; @@ -2189,22 +2227,9 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr
[PATCH v3 10/10] IB/mlx5: Publish support in signature feature
Currently support only T10-DIF types of signature handover operations (typs 1|2|3). issue: 333508 Change-Id: I3ae2cce03a97074d56a52098b15c8bf74962aeed Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/hw/mlx5/main.c |9 + 1 files changed, 9 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index 9dec71d..54736f5 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -274,6 +274,15 @@ static int mlx5_ib_query_device(struct ib_device *ibdev, if (flags MLX5_DEV_CAP_FLAG_XRC) props-device_cap_flags |= IB_DEVICE_XRC; props-device_cap_flags |= IB_DEVICE_MEM_MGT_EXTENSIONS; + if (flags MLX5_DEV_CAP_FLAG_SIG_HAND_OVER) { + props-device_cap_flags |= IB_DEVICE_SIGNATURE_HANDOVER; + /* At this stage no support for signature handover */ + props-sig_prot_cap = IB_PROT_T10DIF_TYPE_1 | + IB_PROT_T10DIF_TYPE_2 | + IB_PROT_T10DIF_TYPE_3; + props-sig_guard_cap = IB_GUARD_T10DIF_CRC | + IB_GUARD_T10DIF_CSUM; + } props-vendor_id = be32_to_cpup((__be32 *)(out_mad-data + 36)) 0xff; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 09/10] IB/mlx5: Collect signature error completion
This commit takes care of the generated signature error cqe generated by the HW (if happened). The underlying mlx5 driver will handle signature error completions and will mark the relevant memory region as dirty. Once the user will get the completion for the transaction he must check for signature errors on signature memory region using a new lightweight verb ib_check_sig_status and if such exsists, he will get the signature error information. In case the user will not check for signature error, i.e. won't call ib_check_sig_status, it will not be allowed to use the memory region for another signature operation (REG_SIG_MR work request will fail). issue: 333508 Change-Id: I002b12c6b685615b97c6fa29902ef06c70b11103 Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/hw/mlx5/cq.c | 54 ++ drivers/infiniband/hw/mlx5/main.c|1 + drivers/infiniband/hw/mlx5/mlx5_ib.h |7 drivers/infiniband/hw/mlx5/mr.c | 31 +++ drivers/infiniband/hw/mlx5/qp.c |8 - include/linux/mlx5/cq.h |1 + include/linux/mlx5/device.h | 18 +++ include/linux/mlx5/driver.h |4 ++ include/linux/mlx5/qp.h |5 +++ 9 files changed, 127 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c index 2834477..ac12dfe 100644 --- a/drivers/infiniband/hw/mlx5/cq.c +++ b/drivers/infiniband/hw/mlx5/cq.c @@ -351,6 +351,33 @@ static void handle_atomics(struct mlx5_ib_qp *qp, struct mlx5_cqe64 *cqe64, qp-sq.last_poll = tail; } +static void get_sig_err_item(struct mlx5_sig_err_cqe *cqe, +struct ib_sig_err *item) +{ + u16 syndrome = be16_to_cpu(cqe-syndrome); + + switch (syndrome) { + case 13: + item-err_type = IB_SIG_BAD_CRC; + break; + case 12: + item-err_type = IB_SIG_BAD_APPTAG; + break; + case 11: + item-err_type = IB_SIG_BAD_REFTAG; + break; + default: + break; + } + + item-expected_guard = be32_to_cpu(cqe-expected_trans_sig) 16; + item-actual_guard = be32_to_cpu(cqe-actual_trans_sig) 16; + item-expected_logical_block = be32_to_cpu(cqe-expected_reftag); + item-actual_logical_block = be32_to_cpu(cqe-actual_reftag); + item-sig_err_offset = be64_to_cpu(cqe-err_offset); + item-key = be32_to_cpu(cqe-mkey); +} + static int mlx5_poll_one(struct mlx5_ib_cq *cq, struct mlx5_ib_qp **cur_qp, struct ib_wc *wc) @@ -360,12 +387,16 @@ static int mlx5_poll_one(struct mlx5_ib_cq *cq, struct mlx5_cqe64 *cqe64; struct mlx5_core_qp *mqp; struct mlx5_ib_wq *wq; + struct mlx5_sig_err_cqe *sig_err_cqe; + struct mlx5_core_mr *mmr; + struct mlx5_ib_mr *mr; uint8_t opcode; uint32_t qpn; u16 wqe_ctr; void *cqe; int idx; +repoll: cqe = next_cqe_sw(cq); if (!cqe) return -EAGAIN; @@ -449,6 +480,29 @@ static int mlx5_poll_one(struct mlx5_ib_cq *cq, } } break; + case MLX5_CQE_SIG_ERR: + sig_err_cqe = (struct mlx5_sig_err_cqe *)cqe64; + + read_lock(dev-mdev.priv.mr_table.lock); + mmr = __mlx5_mr_lookup(dev-mdev, + mlx5_base_mkey(be32_to_cpu(sig_err_cqe-mkey))); + if (unlikely(!mmr)) { + read_unlock(dev-mdev.priv.mr_table.lock); + mlx5_ib_warn(dev, CQE@CQ %06x for unknown MR %6x\n, +cq-mcq.cqn, be32_to_cpu(sig_err_cqe-mkey)); + return -EINVAL; + } + + mr = to_mibmr(mmr); + get_sig_err_item(sig_err_cqe, mr-sig-err_item); + mr-sig-sig_err_exists = true; + mr-sig-sigerr_count++; + + mlx5_ib_dbg(dev, Got SIGERR on key: 0x%x\n, + mr-sig-err_item.key); + + read_unlock(dev-mdev.priv.mr_table.lock); + goto repoll; } return 0; diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index 10263fa..9dec71d 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -1414,6 +1414,7 @@ static int init_one(struct pci_dev *pdev, dev-ib_dev.alloc_fast_reg_mr = mlx5_ib_alloc_fast_reg_mr; dev-ib_dev.alloc_fast_reg_page_list = mlx5_ib_alloc_fast_reg_page_list; dev-ib_dev.free_fast_reg_page_list = mlx5_ib_free_fast_reg_page_list; + dev-ib_dev.check_sig_status= mlx5_ib_check_sig_status; if (mdev-caps.flags MLX5_DEV_CAP_FLAG_XRC) { dev-ib_dev.alloc_xrcd = mlx5_ib_alloc_xrcd
[PATCH v3 04/10] IB/mlx5: Initialize mlx5_ib_qp signature related
If user requested signature enable we Initialize relevant mlx5_ib_qp members. we mark the qp as sig_enable and we increase the effective SQ size, but still limit the user max_send_wr to original size computed. issue: 333508 Change-Id: I72c303f407fc8181139371d4c0a7e7e7550043e0 Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/hw/mlx5/mlx5_ib.h |3 +++ drivers/infiniband/hw/mlx5/qp.c | 16 include/linux/mlx5/qp.h |1 + 3 files changed, 16 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index 43e0497..62b9e93 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -189,6 +189,9 @@ struct mlx5_ib_qp { int create_type; u32 pa_lkey; + + /* Store signature errors */ + boolsignature_en; }; struct mlx5_ib_cq_buf { diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index 7c6b4ba..f61e93c 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -263,6 +263,7 @@ static int calc_sq_size(struct mlx5_ib_dev *dev, struct ib_qp_init_attr *attr, { int wqe_size; int wq_size; + int eff_wq_size; if (!attr-cap.max_send_wr) return 0; @@ -283,7 +284,14 @@ static int calc_sq_size(struct mlx5_ib_dev *dev, struct ib_qp_init_attr *attr, attr-cap.max_inline_data = qp-max_inline_data; wq_size = roundup_pow_of_two(attr-cap.max_send_wr * wqe_size); - qp-sq.wqe_cnt = wq_size / MLX5_SEND_WQE_BB; + if (attr-create_flags IB_QP_CREATE_SIGNATURE_EN) { + eff_wq_size = roundup_pow_of_two(attr-cap.max_send_wr * wqe_size * +MLX5_SIGNATURE_SQ_MULT); + qp-signature_en = true; + } else { + eff_wq_size = wq_size; + } + qp-sq.wqe_cnt = eff_wq_size / MLX5_SEND_WQE_BB; if (qp-sq.wqe_cnt dev-mdev.caps.max_wqes) { mlx5_ib_dbg(dev, wqe count(%d) exceeds limits(%d)\n, qp-sq.wqe_cnt, dev-mdev.caps.max_wqes); @@ -291,10 +299,10 @@ static int calc_sq_size(struct mlx5_ib_dev *dev, struct ib_qp_init_attr *attr, } qp-sq.wqe_shift = ilog2(MLX5_SEND_WQE_BB); qp-sq.max_gs = attr-cap.max_send_sge; - qp-sq.max_post = wq_size / wqe_size; - attr-cap.max_send_wr = qp-sq.max_post; + qp-sq.max_post = eff_wq_size / wqe_size; + attr-cap.max_send_wr = wq_size / wqe_size; - return wq_size; + return eff_wq_size; } static int set_user_buf_size(struct mlx5_ib_dev *dev, diff --git a/include/linux/mlx5/qp.h b/include/linux/mlx5/qp.h index d9e3eac..174805c 100644 --- a/include/linux/mlx5/qp.h +++ b/include/linux/mlx5/qp.h @@ -37,6 +37,7 @@ #include linux/mlx5/driver.h #define MLX5_INVALID_LKEY 0x100 +#define MLX5_SIGNATURE_SQ_MULT 3 enum mlx5_qp_optpar { MLX5_QP_OPTPAR_ALT_ADDR_PATH= 1 0, -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 08/10] IB/mlx5: Support IB_WR_REG_SIG_MR
This patch implements IB_WR_REG_SIG_MR posted by the user. Baisically this compound WR involvs 3 WQEs in order to prepare and properly register the signature layout: 1. post UMR WR to register the sig_mr in one of two possible ways: * In case the user registered a single MR for data so the UMR data segment consists of: - single klm (data MR) passed by the user - BSF with signature attributes requested by the user. * In case the user registered 2 MRs, one for data and one for protection, the UMR consists of: - strided block format which includes data and protection MRs and their repetitive block format. - BSF with signature attributes requested by the user. 2. post SET_PSV in order to set the memory domain initial signature parameters passed by the user. SET_PSV is not signaled and solicited CQE. 3. post SET_PSV in order to set the wire domain initial signature parameters passed by the user. SET_PSV is not signaled and solicited CQE. * After this compound WR we place a small fence for next WR to come. This patch also introduces some helper functions to set the BSF correctly and determining the signature format selectors. issue: 333508 Change-Id: I66843ed14cb41275071b57fbba92018fe19bf4f5 Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/hw/mlx5/qp.c | 417 +++ include/linux/mlx5/device.h |4 + include/linux/mlx5/qp.h | 61 ++ 3 files changed, 482 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index 43f120a..688c68a 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -1722,6 +1722,26 @@ static __be64 frwr_mkey_mask(void) return cpu_to_be64(result); } +static __be64 sig_mkey_mask(void) +{ + u64 result; + + result = MLX5_MKEY_MASK_LEN | + MLX5_MKEY_MASK_PAGE_SIZE| + MLX5_MKEY_MASK_START_ADDR | + MLX5_MKEY_MASK_EN_RINVAL| + MLX5_MKEY_MASK_KEY | + MLX5_MKEY_MASK_LR | + MLX5_MKEY_MASK_LW | + MLX5_MKEY_MASK_RR | + MLX5_MKEY_MASK_RW | + MLX5_MKEY_MASK_SMALL_FENCE | + MLX5_MKEY_MASK_FREE | + MLX5_MKEY_MASK_BSF_EN; + + return cpu_to_be64(result); +} + static void set_frwr_umr_segment(struct mlx5_wqe_umr_ctrl_seg *umr, struct ib_send_wr *wr, int li) { @@ -1906,6 +1926,334 @@ static int set_data_inl_seg(struct mlx5_ib_qp *qp, struct ib_send_wr *wr, return 0; } +static u16 prot_field_size(enum ib_signature_type type) +{ + switch (type) { + case IB_SIG_TYPE_T10_DIF: + return MLX5_DIF_SIZE; + default: + return 0; + } +} + +static u8 bs_selector(int block_size) +{ + switch (block_size) { + case 512: return 0x1; + case 520: return 0x2; + case 4096: return 0x3; + case 4160: return 0x4; + case 1073741824:return 0x5; + default:return 0; + } +} + +static int format_selector(struct ib_sig_attrs *attr, + struct ib_sig_domain *domain, + int *selector) +{ + +#define FORMAT_DIF_NONE0 +#define FORMAT_DIF_CRC_INC 4 +#define FORMAT_DIF_CSUM_INC12 +#define FORMAT_DIF_CRC_NO_INC 13 +#define FORMAT_DIF_CSUM_NO_INC 14 + + switch (domain-sig.dif.type) { + case IB_T10DIF_NONE: + /* No DIF */ + *selector = FORMAT_DIF_NONE; + break; + case IB_T10DIF_TYPE1: /* Fall through */ + case IB_T10DIF_TYPE2: + switch (domain-sig.dif.bg_type) { + case IB_T10DIF_CRC: + *selector = FORMAT_DIF_CRC_INC; + break; + case IB_T10DIF_CSUM: + *selector = FORMAT_DIF_CSUM_INC; + break; + default: + return 1; + } + break; + case IB_T10DIF_TYPE3: + switch (domain-sig.dif.bg_type) { + case IB_T10DIF_CRC: + *selector = domain-sig.dif.type3_inc_reftag ? + FORMAT_DIF_CRC_INC : + FORMAT_DIF_CRC_NO_INC; + break; + case IB_T10DIF_CSUM: + *selector = domain-sig.dif.type3_inc_reftag ? + FORMAT_DIF_CSUM_INC : + FORMAT_DIF_CSUM_NO_INC; + break; + default: + return 1
[PATCH v3 06/10] IB/mlx5: remove MTT access mode from umr flags helper function
get_umr_flags helper function might be used for types of access modes other than ACCESS_MODE_MTT, such as ACCESS_MODE_KLM. so remove it from helper and caller will add it's own access mode flag. This commit does not add/change functionality. issue: 333508 Change-Id: If4aca628d1ca88be93a2161e4a158363dcaa134b Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/hw/mlx5/qp.c |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index 15df91b..43f120a 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -1777,7 +1777,7 @@ static u8 get_umr_flags(int acc) (acc IB_ACCESS_REMOTE_WRITE ? MLX5_PERM_REMOTE_WRITE : 0) | (acc IB_ACCESS_REMOTE_READ ? MLX5_PERM_REMOTE_READ : 0) | (acc IB_ACCESS_LOCAL_WRITE ? MLX5_PERM_LOCAL_WRITE : 0) | - MLX5_PERM_LOCAL_READ | MLX5_PERM_UMR_EN | MLX5_ACCESS_MODE_MTT; + MLX5_PERM_LOCAL_READ | MLX5_PERM_UMR_EN; } static void set_mkey_segment(struct mlx5_mkey_seg *seg, struct ib_send_wr *wr, @@ -1789,7 +1789,8 @@ static void set_mkey_segment(struct mlx5_mkey_seg *seg, struct ib_send_wr *wr, return; } - seg-flags = get_umr_flags(wr-wr.fast_reg.access_flags); + seg-flags = get_umr_flags(wr-wr.fast_reg.access_flags) | +MLX5_ACCESS_MODE_MTT; *writ = seg-flags (MLX5_PERM_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE); seg-qpn_mkey7_0 = cpu_to_be32((wr-wr.fast_reg.rkey 0xff) | 0xff00); seg-flags_pd = cpu_to_be32(MLX5_MKEY_REMOTE_INVAL); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 07/10] IB/mlx5: Keep mlx5 MRs in a radix tree under device
This will be useful when processing signature errors on a specific key. The mlx5 driver will lookup the matching mlx5 memory region structure and mark it as dirty (contains signature errors). issue: 333508 Change-Id: I04dbb746012b050d13161d134d2d05c8c333189a Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/net/ethernet/mellanox/mlx5/core/main.c |1 + drivers/net/ethernet/mellanox/mlx5/core/mr.c | 24 include/linux/mlx5/driver.h| 18 ++ 3 files changed, 43 insertions(+), 0 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index 40a9f5e..6e77c8e 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -446,6 +446,7 @@ int mlx5_dev_init(struct mlx5_core_dev *dev, struct pci_dev *pdev) mlx5_init_cq_table(dev); mlx5_init_qp_table(dev); mlx5_init_srq_table(dev); + mlx5_init_mr_table(dev); return 0; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mr.c b/drivers/net/ethernet/mellanox/mlx5/core/mr.c index bb746bb..4cc9276 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/mr.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/mr.c @@ -36,11 +36,24 @@ #include linux/mlx5/cmd.h #include mlx5_core.h +void mlx5_init_mr_table(struct mlx5_core_dev *dev) +{ + struct mlx5_mr_table *table = dev-priv.mr_table; + + rwlock_init(table-lock); + INIT_RADIX_TREE(table-tree, GFP_ATOMIC); +} + +void mlx5_cleanup_mr_table(struct mlx5_core_dev *dev) +{ +} + int mlx5_core_create_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr, struct mlx5_create_mkey_mbox_in *in, int inlen, mlx5_cmd_cbk_t callback, void *context, struct mlx5_create_mkey_mbox_out *out) { + struct mlx5_mr_table *table = dev-priv.mr_table; struct mlx5_create_mkey_mbox_out lout; int err; u8 key; @@ -73,14 +86,21 @@ int mlx5_core_create_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr, mlx5_core_dbg(dev, out 0x%x, key 0x%x, mkey 0x%x\n, be32_to_cpu(lout.mkey), key, mr-key); + /* connect to MR tree */ + write_lock_irq(table-lock); + err = radix_tree_insert(table-tree, mlx5_base_mkey(mr-key), mr); + write_unlock_irq(table-lock); + return err; } EXPORT_SYMBOL(mlx5_core_create_mkey); int mlx5_core_destroy_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr) { + struct mlx5_mr_table *table = dev-priv.mr_table; struct mlx5_destroy_mkey_mbox_in in; struct mlx5_destroy_mkey_mbox_out out; + unsigned long flags; int err; memset(in, 0, sizeof(in)); @@ -95,6 +115,10 @@ int mlx5_core_destroy_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr) if (out.hdr.status) return mlx5_cmd_status_to_err(out.hdr); + write_lock_irqsave(table-lock, flags); + radix_tree_delete(table-tree, mlx5_base_mkey(mr-key)); + write_unlock_irqrestore(table-lock, flags); + return err; } EXPORT_SYMBOL(mlx5_core_destroy_mkey); diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 58f5b95..1d97762 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -488,6 +488,13 @@ struct mlx5_srq_table { struct radix_tree_root tree; }; +struct mlx5_mr_table { + /* protect radix tree +*/ + rwlock_tlock; + struct radix_tree_root tree; +}; + struct mlx5_priv { charname[MLX5_MAX_NAME_LEN]; struct mlx5_eq_tableeq_table; @@ -517,6 +524,10 @@ struct mlx5_priv { struct mlx5_cq_tablecq_table; /* end: cq staff */ + /* start: mr staff */ + struct mlx5_mr_tablemr_table; + /* end: mr staff */ + /* start: alloc staff */ struct mutexpgdir_mutex; struct list_headpgdir_list; @@ -664,6 +675,11 @@ static inline void mlx5_vfree(const void *addr) kfree(addr); } +static inline u32 mlx5_base_mkey(const u32 key) +{ + return key 0xff00u; +} + int mlx5_dev_init(struct mlx5_core_dev *dev, struct pci_dev *pdev); void mlx5_dev_cleanup(struct mlx5_core_dev *dev); int mlx5_cmd_init(struct mlx5_core_dev *dev); @@ -698,6 +714,8 @@ int mlx5_core_query_srq(struct mlx5_core_dev *dev, struct mlx5_core_srq *srq, struct mlx5_query_srq_mbox_out *out); int mlx5_core_arm_srq(struct mlx5_core_dev *dev, struct mlx5_core_srq *srq, u16 lwm, int is_srq); +void mlx5_init_mr_table(struct mlx5_core_dev *dev); +void mlx5_cleanup_mr_table(struct mlx5_core_dev *dev); int mlx5_core_create_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr, struct
[PATCH v3 00/10] Introduce Signature feature
This patchset Introduces Verbs level support for signature handover feature. Siganture is intended to implement end-to-end data integrity on a transactional basis in a completely offloaded manner. There are several end-to-end data integrity methods used today in various applications and/or upper layer protocols such as T10-DIF defined by SCSI specifications (SBC), CRC32, XOR8 and more. This patchset adds verbs support only for T10-DIF. The proposed framework allows adding more signature methods in the future. In T10-DIF, when a series of 512-byte data blocks are transferred, each block is followed by an 8-byte guard (note that other protection intervals may be used other then 512-bytes). The guard consists of CRC that protects the integrity of the data in the block, and tag that protects against mis-directed IOs and a free tag for application use. Data can be protected when transferred over the wire, but can also be protected in the memory of the sender/receiver. This allows true end- to-end protection against bits flipping either over the wire, through gateways, in memory, over PCI, etc. While T10-DIF clearly defines that over the wire protection guards are interleaved into the data stream (each 512-Byte block followed by 8-byte guard), when in memory, the protection guards may reside in a buffer separated from the data. Depending on the application, it is usually easier to handle the data when it is contiguous. In this case the data buffer will be of size 512xN and the protection buffer will be of size 8xN (where N is the number of blocks in the transaction). There are 3 kinds of signature handover operation: 1. Take unprotected data (from wire or memory) and ADD protection guards. 2. Take protetected data (from wire or memory), validate the data integrity against the protection guards and STRIP the protection guards. 3. Take protected data (from wire or memory), validate the data integrity against the protection guards and PASS the data with the guards as-is. This translates to defining to the HCA how/if data protection exists in memory domain, and how/if data protection exists is wire domain. The way that data integrity is performed is by using a new kind of memory region: signature-enabled MR, and a new kind of work request: REG_SIG_MR. The REG_SIG_MR WR operates on the signature-enabled MR, and defines all the needed information for the signature handover (data buffer, protection buffer if needed and signature attributes). The result is an MR that can be used for data transfer as usual, that will also add/validate/strip/pass protection guards. When the data transfer is successfully completed, it does not mean that there are no integrity errors. The user must afterwards check the signature status of the handover operation using a new light-weight verb. This feature shall be used in storage upper layer protocols iSER/SRP implementing end-to-end data integrity T10-DIF. Following this patchset, we will soon submit krping patches which will demonstrate the usage of these signature verbs. Patchset summary: - Intoduce verbs for create/destroy memory regions supporting signature. - Introduce IB core signature verbs API. - Implement mr create/destroy verbs in mlx5 driver. - Preperation patches for signature support in mlx5 driver. - Implement signature handover work request in mlx5 driver. - Implement signature error collection and handling in mlx5 driver. Changes from v2 (mostly CR comments): - IB/core: Added comment on IB_T10DIF_CRC/CSUM declarations. - IB/core: Renamed block_size as pi_interval in ib_sig_attrs. - IB/core: Took t10_dif domain out of sig union (ib_sig_domain). - IB/mlx5: Fixed memory leak in create_mr - IB/mlx5: Remove redundant assignment in WQE initialization. - IB/mlx5: Fixed possible NULL dereference in check_sig_status and set_sig_wr. - IB/mlx5: Added helper function to convert mkey to base key. - IB/mlx5: Reduced Fencing in compund REG_SIG_MR WR. - Resolved checkpatch warnings. Changes from v1: - IB/core: Reduced sizeof ib_send_wr by using wr-sg_list for data and dedicated ib_sge for protection guards buffer. Currently sig_handover extension does not increase sizeof ib_send_wr - IB/core: Change enum to int for container variables. - IB/mlx5: Validate wr-num_sge=1 for REG_SIG_MR work request. Changes from v0: - Commit messages: Added more detailed explanation for signature work request. - IB/core: Remove indirect memory registration enablement from create_mr. Keep only signature enablement. - IB/mlx5: Changed signature error processing via MR radix lookup. Sagi Grimberg (10): IB/core: Introduce protected memory regions IB/core: Introduce Signature Verbs API IB/mlx5, mlx5_core: Support for create_mr and destroy_mr IB/mlx5: Initialize mlx5_ib_qp signature related IB/mlx5: Break wqe handling to begin finish routines IB/mlx5: remove MTT access mode from umr flags helper function IB/mlx5: Keep mlx5 MRs
[PATCH v3 01/10] IB/core: Introduce protected memory regions
This commit introduces verbs for creating/destoying memory regions which will allow new types of memory key operations such as protected memory registration. Indirect memory registration is registering several (one of more) pre-registered memory regions in a specific layout. The Indirect region may potentialy describe several regions and some repitition format between them. Protected Memory registration is registering a memory region with various data integrity attributes which will describe protection schemes that will be handled by the HCA in an offloaded manner. These memory regions will be applicable for a new REG_SIG_MR work request introduced later in this patchset. In the future these routines may replace or implement current memory regions creation routines existing today: - ib_reg_user_mr - ib_alloc_fast_reg_mr - ib_get_dma_mr - ib_dereg_mr issue: 333508 Change-Id: Id3d221a002af9a95716a44d0163ca0de1c6dbbb8 Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/core/verbs.c | 39 +++ include/rdma/ib_verbs.h | 38 ++ 2 files changed, 77 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index a321df2..ef47667 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -1055,6 +1055,45 @@ int ib_dereg_mr(struct ib_mr *mr) } EXPORT_SYMBOL(ib_dereg_mr); +struct ib_mr *ib_create_mr(struct ib_pd *pd, + struct ib_mr_init_attr *mr_init_attr) +{ + struct ib_mr *mr; + + if (!pd-device-create_mr) + return ERR_PTR(-ENOSYS); + + mr = pd-device-create_mr(pd, mr_init_attr); + + if (!IS_ERR(mr)) { + mr-device = pd-device; + mr-pd = pd; + mr-uobject = NULL; + atomic_inc(pd-usecnt); + atomic_set(mr-usecnt, 0); + } + + return mr; +} +EXPORT_SYMBOL(ib_create_mr); + +int ib_destroy_mr(struct ib_mr *mr) +{ + struct ib_pd *pd; + int ret; + + if (atomic_read(mr-usecnt)) + return -EBUSY; + + pd = mr-pd; + ret = mr-device-destroy_mr(mr); + if (!ret) + atomic_dec(pd-usecnt); + + return ret; +} +EXPORT_SYMBOL(ib_destroy_mr); + struct ib_mr *ib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len) { struct ib_mr *mr; diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index e393171..af1bd1a 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -455,6 +455,22 @@ int ib_rate_to_mult(enum ib_rate rate) __attribute_const__; */ int ib_rate_to_mbps(enum ib_rate rate) __attribute_const__; +enum ib_mr_create_flags { + IB_MR_SIGNATURE_EN = 1, +}; + +/** + * ib_mr_init_attr - Memory region init attributes passed to routine + * ib_create_mr. + * @max_reg_descriptors: max number of registration descriptors that + * may be used with registration work requests. + * @flags: MR creation flags bit mask. + */ +struct ib_mr_init_attr { + int max_reg_descriptors; + u32 flags; +}; + /** * mult_to_ib_rate - Convert a multiple of 2.5 Gbit/sec to an IB rate * enum. @@ -1372,6 +1388,9 @@ struct ib_device { int(*query_mr)(struct ib_mr *mr, struct ib_mr_attr *mr_attr); int(*dereg_mr)(struct ib_mr *mr); + int(*destroy_mr)(struct ib_mr *mr); + struct ib_mr * (*create_mr)(struct ib_pd *pd, + struct ib_mr_init_attr *mr_init_attr); struct ib_mr * (*alloc_fast_reg_mr)(struct ib_pd *pd, int max_page_list_len); struct ib_fast_reg_page_list * (*alloc_fast_reg_page_list)(struct ib_device *device, @@ -2212,6 +2231,25 @@ int ib_query_mr(struct ib_mr *mr, struct ib_mr_attr *mr_attr); */ int ib_dereg_mr(struct ib_mr *mr); + +/** + * ib_create_mr - Allocates a memory region that may be used for + * signature handover operations. + * @pd: The protection domain associated with the region. + * @mr_init_attr: memory region init attributes. + */ +struct ib_mr *ib_create_mr(struct ib_pd *pd, + struct ib_mr_init_attr *mr_init_attr); + +/** + * ib_destroy_mr - Destroys a memory region that was created using + * ib_create_mr and removes it from HW translation tables. + * @mr: The memory region to destroy. + * + * This function can fail, if the memory region has memory windows bound to it. + */ +int ib_destroy_mr(struct ib_mr *mr); + /** * ib_alloc_fast_reg_mr - Allocates memory region usable with the * IB_WR_FAST_REG_MR send work request. -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message
Re: [PATCH v3 00/10] Introduce Signature feature
On 11/14/2013 9:30 AM, Or Gerlitz wrote: On 14/11/2013 02:19, Hefty, Sean wrote: The patch series is around for couple of weeks already and went through the review of Sean and Bart, with all their feedback being applied. Also Sagi and Co enhanced krping to fully cover (and test...) the proposed API and driver implementation Somewhat separate from this specific patch, this is my concern. There are continual requests to modify the kernel verbs interfaces. These requests boil down to exposing proprietary capabilities to the latest version of some vendor's hardware. In turn, these hardware specific knobs bleed into the kernel clients. At the very least, it seems that there should be some sort of discussion if this is a desirable property of the kernel verbs interface, and if this is the architecture that the kernel should continue to pursue. Or, is there an alternative way of providing the same ability of coding ULPs to specific HW features, versus plugging every new feature into 'post send'? Sean, Being concrete + re-iterating and expanding what I wrote you earlier on the V1 thread @ http://marc.info/?l=linux-rdmam=138314853203389w=2when you said Sean Maybe we should rethink the approach of exposing low-level hardware constructs to every Sean distinct feature of every vendor's latest hardware directly to the kernel ULPs. To begin with T10 DIF **is** industry standard, which is to be used in production storage systems, the feature here is T10 DIF acceleration for upstream kernel storage drivers such as iSER/SRP/FCoE initiator/targets that use RDMA and are included in commercial distributions which are used by customers. Note that this/similar feature is supported by some FC cards too, so we want RDMA to be competitive. This work is part of larger efforts which are done nowadays in other parts of the kernel such as the block layer, the upstream kernel target and more to support T10, its just the RDMA part. Sagi and team made great effort to expose API which isn't tied to specific HW/Firmware API. And in that respect, the verbs API is coupled with industry standards and by no means with specific HW features. Just as quick example, the specific driver/card (mlx5 / ConnectIB) for which the news verbs are implemented uses three objects for its T10 support, named BSF, KLM and PSV - you can be sure, and please check us that there is no sign for them in the verbs API, they only live within the mlx5 driver. If you see a vendor specific feature/construct that appears in the proposed verbs API changes, let us know. [...] versus plugging every new feature into 'post send'? Its a new feature indeed but its a feature which comes into play when submitting RDMA work-requests to the HCA and for performance reasons must be subject to pipe-lining in the form of batched posting and hence has very good fit as a sub operation of post-send. Sean There are continual requests to modify the kernel verbs interfaces. These requests boil down to exposing proprietary capabilities Sean to the latest version of some vendor's hardware. In turn, these hardware specific knobs bleed into the kernel clients. non-T10 examples (please) ?! Or. Hey Sean, Just to add on Or's input, I really don't agree this is some specific HW capability exposed to ULPs. This feature allows offloading data-integrity handling over RDMA which is a wider concept then just T10-DIF (although we currently expose T10-DIF alone). Signature verbs API does not introduce something specific to Mellanox, we think API is generic enough to allow each vendor to support signature with some degree of freedom. Just needs to implement the 3-steps: create signature enabled MR, bind MR to signature attributes (work-request) and check for signature status at the end of the transaction. Regarding plugging into post_send, The signature operation is a fast-path operation and I agree with Or regarding the value of batching work requests. Moreover, I think this is a separate discussion. If we agree on another API posting on the send-queue, it will require work also for migrating fastreg and bind_mw extensions. So how about going with current framework, and start a discussion on your concern taking non-SEND WR extensions out of post_send. Sagi. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 04/10] IB/mlx5: Initialize mlx5_ib_qp signature related
If user requested signature enable we Initialize relevant mlx5_ib_qp members. we mark the qp as sig_enable and we check if wqe_size will fit compound REG_SIG_MR work request (UMR + 2 x SET_PSV wqes), if computed wqe_size is smaller we align wqe_size to MLX5_SIG_WQE_SIZE. Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/hw/mlx5/mlx5_ib.h |3 +++ drivers/infiniband/hw/mlx5/qp.c | 10 -- include/linux/mlx5/qp.h |1 + 3 files changed, 12 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index 43e0497..62b9e93 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -189,6 +189,9 @@ struct mlx5_ib_qp { int create_type; u32 pa_lkey; + + /* Store signature errors */ + boolsignature_en; }; struct mlx5_ib_cq_buf { diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index 7c6b4ba..07aa3ca 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -254,8 +254,11 @@ static int calc_send_wqe(struct ib_qp_init_attr *attr) } size += attr-cap.max_send_sge * sizeof(struct mlx5_wqe_data_seg); - - return ALIGN(max_t(int, inl_size, size), MLX5_SEND_WQE_BB); + if (attr-create_flags IB_QP_CREATE_SIGNATURE_EN + ALIGN(max_t(int, inl_size, size), MLX5_SEND_WQE_BB) MLX5_SIG_WQE_SIZE) + return MLX5_SIG_WQE_SIZE; + else + return ALIGN(max_t(int, inl_size, size), MLX5_SEND_WQE_BB); } static int calc_sq_size(struct mlx5_ib_dev *dev, struct ib_qp_init_attr *attr, @@ -282,6 +285,9 @@ static int calc_sq_size(struct mlx5_ib_dev *dev, struct ib_qp_init_attr *attr, sizeof(struct mlx5_wqe_inline_seg); attr-cap.max_inline_data = qp-max_inline_data; + if (attr-create_flags IB_QP_CREATE_SIGNATURE_EN) + qp-signature_en = true; + wq_size = roundup_pow_of_two(attr-cap.max_send_wr * wqe_size); qp-sq.wqe_cnt = wq_size / MLX5_SEND_WQE_BB; if (qp-sq.wqe_cnt dev-mdev.caps.max_wqes) { diff --git a/include/linux/mlx5/qp.h b/include/linux/mlx5/qp.h index d9e3eac..711094c 100644 --- a/include/linux/mlx5/qp.h +++ b/include/linux/mlx5/qp.h @@ -37,6 +37,7 @@ #include linux/mlx5/driver.h #define MLX5_INVALID_LKEY 0x100 +#define MLX5_SIG_WQE_SIZE (MLX5_SEND_WQE_BB * 5) enum mlx5_qp_optpar { MLX5_QP_OPTPAR_ALT_ADDR_PATH= 1 0, -- 1.7.8.2 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 03/10] IB/mlx5, mlx5_core: Support for create_mr and destroy_mr
Support create_mr and destroy_mr verbs. For now, create/destroy routines will only support user request for signature enabled memory regions. The created memory region will be an indirect memory key that will be able to register pre-registered data buffer and protection guards buffer (pre-registered as well). The corresponding mlx5_ib_mr will be attached with mlx5 specific signature entities (BSF, PSVs). for non-signature enabled regions, the resulting ib_mr is a free region applicable for fast registration work requests. Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/hw/mlx5/main.c|2 + drivers/infiniband/hw/mlx5/mlx5_ib.h |4 + drivers/infiniband/hw/mlx5/mr.c | 111 ++ drivers/net/ethernet/mellanox/mlx5/core/mr.c | 61 ++ include/linux/mlx5/device.h | 25 ++ include/linux/mlx5/driver.h | 19 + 6 files changed, 222 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index 3065341..10263fa 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -1406,9 +1406,11 @@ static int init_one(struct pci_dev *pdev, dev-ib_dev.get_dma_mr = mlx5_ib_get_dma_mr; dev-ib_dev.reg_user_mr = mlx5_ib_reg_user_mr; dev-ib_dev.dereg_mr= mlx5_ib_dereg_mr; + dev-ib_dev.destroy_mr = mlx5_ib_destroy_mr; dev-ib_dev.attach_mcast= mlx5_ib_mcg_attach; dev-ib_dev.detach_mcast= mlx5_ib_mcg_detach; dev-ib_dev.process_mad = mlx5_ib_process_mad; + dev-ib_dev.create_mr = mlx5_ib_create_mr; dev-ib_dev.alloc_fast_reg_mr = mlx5_ib_alloc_fast_reg_mr; dev-ib_dev.alloc_fast_reg_page_list = mlx5_ib_alloc_fast_reg_page_list; dev-ib_dev.free_fast_reg_page_list = mlx5_ib_free_fast_reg_page_list; diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index 4c134d9..43e0497 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -265,6 +265,7 @@ struct mlx5_ib_mr { struct mlx5_ib_dev *dev; struct mlx5_create_mkey_mbox_out out; unsigned long start; + struct mlx5_core_sig_ctx*sig; }; struct mlx5_ib_fast_reg_page_list { @@ -495,6 +496,9 @@ struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, u64 virt_addr, int access_flags, struct ib_udata *udata); int mlx5_ib_dereg_mr(struct ib_mr *ibmr); +int mlx5_ib_destroy_mr(struct ib_mr *ibmr); +struct ib_mr *mlx5_ib_create_mr(struct ib_pd *pd, + struct ib_mr_init_attr *mr_init_attr); struct ib_mr *mlx5_ib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len); struct ib_fast_reg_page_list *mlx5_ib_alloc_fast_reg_page_list(struct ib_device *ibdev, diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c index 039c3e4..e65cd0c 100644 --- a/drivers/infiniband/hw/mlx5/mr.c +++ b/drivers/infiniband/hw/mlx5/mr.c @@ -993,6 +993,117 @@ int mlx5_ib_dereg_mr(struct ib_mr *ibmr) return 0; } +struct ib_mr *mlx5_ib_create_mr(struct ib_pd *pd, + struct ib_mr_init_attr *mr_init_attr) +{ + struct mlx5_ib_dev *dev = to_mdev(pd-device); + struct mlx5_create_mkey_mbox_in *in; + struct mlx5_ib_mr *mr; + int access_mode, err; + int ndescs = roundup(mr_init_attr-max_reg_descriptors, 4); + + mr = kzalloc(sizeof(*mr), GFP_KERNEL); + if (!mr) + return ERR_PTR(-ENOMEM); + + in = kzalloc(sizeof(*in), GFP_KERNEL); + if (!in) { + err = -ENOMEM; + goto err_free; + } + + in-seg.status = 1 6; /* free */ + in-seg.xlt_oct_size = cpu_to_be32(ndescs); + in-seg.qpn_mkey7_0 = cpu_to_be32(0xff 8); + in-seg.flags_pd = cpu_to_be32(to_mpd(pd)-pdn); + access_mode = MLX5_ACCESS_MODE_MTT; + + if (mr_init_attr-flags IB_MR_SIGNATURE_EN) { + u32 psv_index[2]; + + in-seg.flags_pd = cpu_to_be32(be32_to_cpu(in-seg.flags_pd) | + MLX5_MKEY_BSF_EN); + in-seg.bsfs_octo_size = cpu_to_be32(MLX5_MKEY_BSF_OCTO_SIZE); + mr-sig = kzalloc(sizeof(*mr-sig), GFP_KERNEL); + if (!mr-sig) { + err = -ENOMEM; + goto err_free_in; + } + + /* create mem wire PSVs */ + err = mlx5_core_create_psv(dev-mdev, to_mpd(pd)-pdn, + 2, psv_index); + if (err) + goto err_free_sig; + + access_mode
[PATCH v4 09/10] IB/mlx5: Collect signature error completion
This commit takes care of the generated signature error cqe generated by the HW (if happened). The underlying mlx5 driver will handle signature error completions and will lookup the relevant memory region (under a read_lock) and mark it as dirty (contains a signature error). Once the user will get the completion for the transaction he must check for signature errors on signature memory region using a new lightweight verb ib_check_mr_status and if such exsists, he will get signature error information such as error type, error offset, expected/actual values. In case the user will not check for signature error, i.e. won't call ib_check_mr_status with status check IB_MR_CHECK_SIG_STATUS, it will not be allowed to use the memory region for another signature operation (REG_SIG_MR work request will fail). Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/hw/mlx5/cq.c | 64 ++ drivers/infiniband/hw/mlx5/main.c|1 + drivers/infiniband/hw/mlx5/mlx5_ib.h |7 drivers/infiniband/hw/mlx5/mr.c | 47 + drivers/infiniband/hw/mlx5/qp.c |8 +++- include/linux/mlx5/cq.h |1 + include/linux/mlx5/device.h | 18 + include/linux/mlx5/driver.h |4 ++ include/linux/mlx5/qp.h |5 +++ 9 files changed, 153 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c index b726274..0990a54 100644 --- a/drivers/infiniband/hw/mlx5/cq.c +++ b/drivers/infiniband/hw/mlx5/cq.c @@ -351,6 +351,38 @@ static void handle_atomics(struct mlx5_ib_qp *qp, struct mlx5_cqe64 *cqe64, qp-sq.last_poll = tail; } +static void get_sig_err_item(struct mlx5_sig_err_cqe *cqe, +struct ib_sig_err *item) +{ + u16 syndrome = be16_to_cpu(cqe-syndrome); + +#define GUARD_ERR (1 13) +#define APPTAG_ERR (1 12) +#define REFTAG_ERR (1 11) + + if (syndrome GUARD_ERR) { + item-err_type = IB_SIG_BAD_GUARD; + item-expected = be32_to_cpu(cqe-expected_trans_sig) 16; + item-actual = be32_to_cpu(cqe-actual_trans_sig) 16; + } else + if (syndrome REFTAG_ERR) { + item-err_type = IB_SIG_BAD_REFTAG; + item-expected = be32_to_cpu(cqe-expected_reftag); + item-actual = be32_to_cpu(cqe-actual_reftag); + } else + if (syndrome APPTAG_ERR) { + item-err_type = IB_SIG_BAD_APPTAG; + item-expected = be32_to_cpu(cqe-expected_trans_sig) 0x; + item-actual = be32_to_cpu(cqe-actual_trans_sig) 0x; + } else { + pr_err(Got signature completion error with bad syndrom %04x\n, + syndrome); + } + + item-sig_err_offset = be64_to_cpu(cqe-err_offset); + item-key = be32_to_cpu(cqe-mkey); +} + static int mlx5_poll_one(struct mlx5_ib_cq *cq, struct mlx5_ib_qp **cur_qp, struct ib_wc *wc) @@ -360,12 +392,16 @@ static int mlx5_poll_one(struct mlx5_ib_cq *cq, struct mlx5_cqe64 *cqe64; struct mlx5_core_qp *mqp; struct mlx5_ib_wq *wq; + struct mlx5_sig_err_cqe *sig_err_cqe; + struct mlx5_core_mr *mmr; + struct mlx5_ib_mr *mr; uint8_t opcode; uint32_t qpn; u16 wqe_ctr; void *cqe; int idx; +repoll: cqe = next_cqe_sw(cq); if (!cqe) return -EAGAIN; @@ -449,6 +485,34 @@ static int mlx5_poll_one(struct mlx5_ib_cq *cq, } } break; + case MLX5_CQE_SIG_ERR: + sig_err_cqe = (struct mlx5_sig_err_cqe *)cqe64; + + read_lock(dev-mdev.priv.mr_table.lock); + mmr = __mlx5_mr_lookup(dev-mdev, + mlx5_base_mkey(be32_to_cpu(sig_err_cqe-mkey))); + if (unlikely(!mmr)) { + read_unlock(dev-mdev.priv.mr_table.lock); + mlx5_ib_warn(dev, CQE@CQ %06x for unknown MR %6x\n, +cq-mcq.cqn, be32_to_cpu(sig_err_cqe-mkey)); + return -EINVAL; + } + + mr = to_mibmr(mmr); + get_sig_err_item(sig_err_cqe, mr-sig-err_item); + mr-sig-sig_err_exists = true; + mr-sig-sigerr_count++; + + mlx5_ib_warn(dev, CQN: 0x%x Got SIGERR on key: 0x%x err_type %x +err_offset %llx expected %x actual %x\n, +cq-mcq.cqn, mr-sig-err_item.key, +mr-sig-err_item.err_type, +mr-sig-err_item.sig_err_offset, +mr-sig-err_item.expected, +mr-sig-err_item.actual); + + read_unlock(dev
[PATCH v4 01/10] IB/core: Introduce protected memory regions
This commit introduces verbs for creating/destoying memory regions which will allow new types of memory key operations such as protected memory registration. Indirect memory registration is registering several (one of more) pre-registered memory regions in a specific layout. The Indirect region may potentialy describe several regions and some repitition format between them. Protected Memory registration is registering a memory region with various data integrity attributes which will describe protection schemes that will be handled by the HCA in an offloaded manner. A protected region will describe pre-registered regions for data, protection block guards and the repetitive stride of them. These memory regions will be applicable for a new REG_SIG_MR work request introduced later in this patchset. In the future these routines may replace or implement current memory regions creation routines existing today: - ib_reg_user_mr - ib_alloc_fast_reg_mr - ib_get_dma_mr - ib_dereg_mr Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/core/verbs.c | 39 +++ include/rdma/ib_verbs.h | 38 ++ 2 files changed, 77 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index d4f6ddf..f4c3bfb 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -1072,6 +1072,45 @@ int ib_dereg_mr(struct ib_mr *mr) } EXPORT_SYMBOL(ib_dereg_mr); +struct ib_mr *ib_create_mr(struct ib_pd *pd, + struct ib_mr_init_attr *mr_init_attr) +{ + struct ib_mr *mr; + + if (!pd-device-create_mr) + return ERR_PTR(-ENOSYS); + + mr = pd-device-create_mr(pd, mr_init_attr); + + if (!IS_ERR(mr)) { + mr-device = pd-device; + mr-pd = pd; + mr-uobject = NULL; + atomic_inc(pd-usecnt); + atomic_set(mr-usecnt, 0); + } + + return mr; +} +EXPORT_SYMBOL(ib_create_mr); + +int ib_destroy_mr(struct ib_mr *mr) +{ + struct ib_pd *pd; + int ret; + + if (atomic_read(mr-usecnt)) + return -EBUSY; + + pd = mr-pd; + ret = mr-device-destroy_mr(mr); + if (!ret) + atomic_dec(pd-usecnt); + + return ret; +} +EXPORT_SYMBOL(ib_destroy_mr); + struct ib_mr *ib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len) { struct ib_mr *mr; diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 979874c..81d1406 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -457,6 +457,22 @@ int ib_rate_to_mult(enum ib_rate rate) __attribute_const__; */ int ib_rate_to_mbps(enum ib_rate rate) __attribute_const__; +enum ib_mr_create_flags { + IB_MR_SIGNATURE_EN = 1, +}; + +/** + * ib_mr_init_attr - Memory region init attributes passed to routine + * ib_create_mr. + * @max_reg_descriptors: max number of registration descriptors that + * may be used with registration work requests. + * @flags: MR creation flags bit mask. + */ +struct ib_mr_init_attr { + int max_reg_descriptors; + u32 flags; +}; + /** * mult_to_ib_rate - Convert a multiple of 2.5 Gbit/sec to an IB rate * enum. @@ -1374,6 +1390,9 @@ struct ib_device { int(*query_mr)(struct ib_mr *mr, struct ib_mr_attr *mr_attr); int(*dereg_mr)(struct ib_mr *mr); + int(*destroy_mr)(struct ib_mr *mr); + struct ib_mr * (*create_mr)(struct ib_pd *pd, + struct ib_mr_init_attr *mr_init_attr); struct ib_mr * (*alloc_fast_reg_mr)(struct ib_pd *pd, int max_page_list_len); struct ib_fast_reg_page_list * (*alloc_fast_reg_page_list)(struct ib_device *device, @@ -2215,6 +2234,25 @@ int ib_query_mr(struct ib_mr *mr, struct ib_mr_attr *mr_attr); */ int ib_dereg_mr(struct ib_mr *mr); + +/** + * ib_create_mr - Allocates a memory region that may be used for + * signature handover operations. + * @pd: The protection domain associated with the region. + * @mr_init_attr: memory region init attributes. + */ +struct ib_mr *ib_create_mr(struct ib_pd *pd, + struct ib_mr_init_attr *mr_init_attr); + +/** + * ib_destroy_mr - Destroys a memory region that was created using + * ib_create_mr and removes it from HW translation tables. + * @mr: The memory region to destroy. + * + * This function can fail, if the memory region has memory windows bound to it. + */ +int ib_destroy_mr(struct ib_mr *mr); + /** * ib_alloc_fast_reg_mr - Allocates memory region usable with the * IB_WR_FAST_REG_MR send work request. -- 1.7.8.2 -- To unsubscribe from this list: send
[PATCH v4 02/10] IB/core: Introduce Signature Verbs API
This commit Introduces the Verbs Interface for signature related operations. A signature handover operation shall configure the layouts of data and protection attributes both in memory and wire domains. Signature operations are: - INSERT Generate and insert protection information when handing over data from input space to output space. - vaildate and STRIP: Validate protection information and remove it when handing over data from input space to output space. - validate and PASS: Validate protection information and pass it when handing over data from input space to output space. Once the signature handover opration is done, the HCA will offload data integrity generation/validation while performing the actual data transfer. Additions: 1. HCA signature capabilities in device attributes Verbs provider supporting Signature handover operations shall fill relevant fields in device attributes structure returned by ib_query_device. 2. QP creation flag IB_QP_CREATE_SIGNATURE_EN Creating QP that will carry signature handover operations may require some special preperations from the verbs provider. So we add QP creation flag IB_QP_CREATE_SIGNATURE_EN to declare that the created QP may carry out signature handover operations. Expose signature support to verbs layer (no support for now) 3. New send work request IB_WR_REG_SIG_MR Signature handover work request. This WR will define the signature handover properties of the memory/wire domains as well as the domains layout. The purpose of this work request is to bind all the needed information for the signature operation: - data to be transferred: wr-sg_list (ib_sge). * The raw data, pre-registered to a single MR (normally, before signature, this MR would have been used directly for the data transfer) - data protection guards: sig_handover.prot (ib_sge). * The data protection buffer, pre-registered to a single MR, which contains the data integrity guards of the raw data blocks. Note that it may not always exist, only in cases where the user is interested in storing protection guards in memory. - signature operation attributes: sig_handover.sig_attrs. * Tells the HCA how to validate/generate the protection information. Once the work request is executed, the memory region which will describe the signature transaction will be the sig_mr. The application can now go ahead and send the sig_mr.rkey or use the sig_mr.lkey for data transfer. 4. New Verb ib_check_mr_status check_mr_status Verb shall check the status of the memory region post transaction. The first check that may be used is IB_MR_CHECK_SIG_STATUS which will indicate if any signature errors are pending for a specific signature-enabled ib_mr. This Verb is a lightwight check and is allowed to be taken from interrupt context. Application must call this verb after it is known that the actual data transfer has finished. issue: 333508 Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/core/verbs.c |8 ++ include/rdma/ib_verbs.h | 149 ++- 2 files changed, 156 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index f4c3bfb..f617cb9 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -1340,3 +1340,11 @@ int ib_destroy_flow(struct ib_flow *flow_id) return err; } EXPORT_SYMBOL(ib_destroy_flow); + +int ib_check_mr_status(struct ib_mr *mr, u32 check_mask, + struct ib_mr_status *mr_status) +{ + return mr-device-check_mr_status ? + mr-device-check_mr_status(mr, check_mask, mr_status) : -ENOSYS; +} +EXPORT_SYMBOL(ib_check_mr_status); diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 81d1406..2c75c29 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -119,7 +119,19 @@ enum ib_device_cap_flags { IB_DEVICE_BLOCK_MULTICAST_LOOPBACK = (122), IB_DEVICE_MEM_WINDOW_TYPE_2A= (123), IB_DEVICE_MEM_WINDOW_TYPE_2B= (124), - IB_DEVICE_MANAGED_FLOW_STEERING = (129) + IB_DEVICE_MANAGED_FLOW_STEERING = (129), + IB_DEVICE_SIGNATURE_HANDOVER= (130) +}; + +enum ib_signature_prot_cap { + IB_PROT_T10DIF_TYPE_1 = 1, + IB_PROT_T10DIF_TYPE_2 = 1 1, + IB_PROT_T10DIF_TYPE_3 = 1 2, +}; + +enum ib_signature_guard_cap { + IB_GUARD_T10DIF_CRC = 1, + IB_GUARD_T10DIF_CSUM= 1 1, }; enum ib_atomic_cap { @@ -169,6 +181,8 @@ struct ib_device_attr { unsigned intmax_fast_reg_page_list_len; u16 max_pkeys; u8 local_ca_ack_delay; + int sig_prot_cap; + int sig_guard_cap; }; enum ib_mtu { @@ -473,6 +487,114 @@ struct
[PATCH v4 07/10] IB/mlx5: Keep mlx5 MRs in a radix tree under device
This radix tree will be useful when processing signature errors on a specific key. The mlx5 driver shall lookup the matching mlx5 memory region structure and mark it as dirty (contains signature errors). The radix tree is protected under a rw_lock as signature error processing is guaranteed not to compete with other contexts for a specific key, thus read_lock is sufficient. Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/net/ethernet/mellanox/mlx5/core/main.c |1 + drivers/net/ethernet/mellanox/mlx5/core/mr.c | 24 include/linux/mlx5/driver.h| 18 ++ 3 files changed, 43 insertions(+), 0 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index 40a9f5e..6e77c8e 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -446,6 +446,7 @@ int mlx5_dev_init(struct mlx5_core_dev *dev, struct pci_dev *pdev) mlx5_init_cq_table(dev); mlx5_init_qp_table(dev); mlx5_init_srq_table(dev); + mlx5_init_mr_table(dev); return 0; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mr.c b/drivers/net/ethernet/mellanox/mlx5/core/mr.c index bb746bb..4cc9276 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/mr.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/mr.c @@ -36,11 +36,24 @@ #include linux/mlx5/cmd.h #include mlx5_core.h +void mlx5_init_mr_table(struct mlx5_core_dev *dev) +{ + struct mlx5_mr_table *table = dev-priv.mr_table; + + rwlock_init(table-lock); + INIT_RADIX_TREE(table-tree, GFP_ATOMIC); +} + +void mlx5_cleanup_mr_table(struct mlx5_core_dev *dev) +{ +} + int mlx5_core_create_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr, struct mlx5_create_mkey_mbox_in *in, int inlen, mlx5_cmd_cbk_t callback, void *context, struct mlx5_create_mkey_mbox_out *out) { + struct mlx5_mr_table *table = dev-priv.mr_table; struct mlx5_create_mkey_mbox_out lout; int err; u8 key; @@ -73,14 +86,21 @@ int mlx5_core_create_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr, mlx5_core_dbg(dev, out 0x%x, key 0x%x, mkey 0x%x\n, be32_to_cpu(lout.mkey), key, mr-key); + /* connect to MR tree */ + write_lock_irq(table-lock); + err = radix_tree_insert(table-tree, mlx5_base_mkey(mr-key), mr); + write_unlock_irq(table-lock); + return err; } EXPORT_SYMBOL(mlx5_core_create_mkey); int mlx5_core_destroy_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr) { + struct mlx5_mr_table *table = dev-priv.mr_table; struct mlx5_destroy_mkey_mbox_in in; struct mlx5_destroy_mkey_mbox_out out; + unsigned long flags; int err; memset(in, 0, sizeof(in)); @@ -95,6 +115,10 @@ int mlx5_core_destroy_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr) if (out.hdr.status) return mlx5_cmd_status_to_err(out.hdr); + write_lock_irqsave(table-lock, flags); + radix_tree_delete(table-tree, mlx5_base_mkey(mr-key)); + write_unlock_irqrestore(table-lock, flags); + return err; } EXPORT_SYMBOL(mlx5_core_destroy_mkey); diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 58f5b95..1d97762 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -488,6 +488,13 @@ struct mlx5_srq_table { struct radix_tree_root tree; }; +struct mlx5_mr_table { + /* protect radix tree +*/ + rwlock_tlock; + struct radix_tree_root tree; +}; + struct mlx5_priv { charname[MLX5_MAX_NAME_LEN]; struct mlx5_eq_tableeq_table; @@ -517,6 +524,10 @@ struct mlx5_priv { struct mlx5_cq_tablecq_table; /* end: cq staff */ + /* start: mr staff */ + struct mlx5_mr_tablemr_table; + /* end: mr staff */ + /* start: alloc staff */ struct mutexpgdir_mutex; struct list_headpgdir_list; @@ -664,6 +675,11 @@ static inline void mlx5_vfree(const void *addr) kfree(addr); } +static inline u32 mlx5_base_mkey(const u32 key) +{ + return key 0xff00u; +} + int mlx5_dev_init(struct mlx5_core_dev *dev, struct pci_dev *pdev); void mlx5_dev_cleanup(struct mlx5_core_dev *dev); int mlx5_cmd_init(struct mlx5_core_dev *dev); @@ -698,6 +714,8 @@ int mlx5_core_query_srq(struct mlx5_core_dev *dev, struct mlx5_core_srq *srq, struct mlx5_query_srq_mbox_out *out); int mlx5_core_arm_srq(struct mlx5_core_dev *dev, struct mlx5_core_srq *srq, u16 lwm, int is_srq); +void mlx5_init_mr_table(struct mlx5_core_dev *dev); +void mlx5_cleanup_mr_table(struct mlx5_core_dev *dev); int
[PATCH v4 00/10] Introduce Signature feature
detailed explanation for signature work request. - IB/core: Remove indirect memory registration enablement from create_mr. Keep only signature enablement. - IB/mlx5: Changed signature error processing via MR radix lookup. Sagi Grimberg (10): IB/core: Introduce protected memory regions IB/core: Introduce Signature Verbs API IB/mlx5, mlx5_core: Support for create_mr and destroy_mr IB/mlx5: Initialize mlx5_ib_qp signature related IB/mlx5: Break wqe handling to begin finish routines IB/mlx5: remove MTT access mode from umr flags helper function IB/mlx5: Keep mlx5 MRs in a radix tree under device IB/mlx5: Support IB_WR_REG_SIG_MR IB/mlx5: Collect signature error completion IB/mlx5: Publish support in signature feature drivers/infiniband/core/verbs.c| 47 ++ drivers/infiniband/hw/mlx5/cq.c| 64 +++ drivers/infiniband/hw/mlx5/main.c | 12 + drivers/infiniband/hw/mlx5/mlx5_ib.h | 14 + drivers/infiniband/hw/mlx5/mr.c| 158 +++ drivers/infiniband/hw/mlx5/qp.c| 559 ++-- drivers/net/ethernet/mellanox/mlx5/core/main.c |1 + drivers/net/ethernet/mellanox/mlx5/core/mr.c | 85 include/linux/mlx5/cq.h|1 + include/linux/mlx5/device.h| 47 ++ include/linux/mlx5/driver.h| 41 ++ include/linux/mlx5/qp.h| 67 +++ include/rdma/ib_verbs.h| 187 - 13 files changed, 1242 insertions(+), 41 deletions(-) -- 1.7.8.2 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 10/10] IB/mlx5: Publish support in signature feature
Currently support only T10-DIF types of signature handover operations (typs 1|2|3). Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/hw/mlx5/main.c |9 + 1 files changed, 9 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index 89ae2e5..63d9044 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -274,6 +274,15 @@ static int mlx5_ib_query_device(struct ib_device *ibdev, if (flags MLX5_DEV_CAP_FLAG_XRC) props-device_cap_flags |= IB_DEVICE_XRC; props-device_cap_flags |= IB_DEVICE_MEM_MGT_EXTENSIONS; + if (flags MLX5_DEV_CAP_FLAG_SIG_HAND_OVER) { + props-device_cap_flags |= IB_DEVICE_SIGNATURE_HANDOVER; + /* At this stage no support for signature handover */ + props-sig_prot_cap = IB_PROT_T10DIF_TYPE_1 | + IB_PROT_T10DIF_TYPE_2 | + IB_PROT_T10DIF_TYPE_3; + props-sig_guard_cap = IB_GUARD_T10DIF_CRC | + IB_GUARD_T10DIF_CSUM; + } props-vendor_id = be32_to_cpup((__be32 *)(out_mad-data + 36)) 0xff; -- 1.7.8.2 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 08/10] IB/mlx5: Support IB_WR_REG_SIG_MR
This patch implements IB_WR_REG_SIG_MR posted by the user. Baisically this WR involvs 3 WQEs in order to prepare and properly register the signature layout: 1. post UMR WR to register the sig_mr in one of two possible ways: * In case the user registered a single MR for data so the UMR data segment consists of: - single klm (data MR) passed by the user - BSF with signature attributes requested by the user. * In case the user registered 2 MRs, one for data and one for protection, the UMR consists of: - strided block format which includes data and protection MRs and their repetitive block format. - BSF with signature attributes requested by the user. 2. post SET_PSV in order to set the memory domain initial signature parameters passed by the user. SET_PSV is not signaled and solicited CQE. 3. post SET_PSV in order to set the wire domain initial signature parameters passed by the user. SET_PSV is not signaled and solicited CQE. * After this compound WR we place a small fence for next WR to come. This patch also introduces some helper functions to set the BSF correctly and determining the signature format selectors. Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/hw/mlx5/qp.c | 443 +++ include/linux/mlx5/device.h |4 + include/linux/mlx5/qp.h | 61 ++ 3 files changed, 508 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index 1329c10..b0f066b 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -1720,6 +1720,26 @@ static __be64 frwr_mkey_mask(void) return cpu_to_be64(result); } +static __be64 sig_mkey_mask(void) +{ + u64 result; + + result = MLX5_MKEY_MASK_LEN | + MLX5_MKEY_MASK_PAGE_SIZE| + MLX5_MKEY_MASK_START_ADDR | + MLX5_MKEY_MASK_EN_RINVAL| + MLX5_MKEY_MASK_KEY | + MLX5_MKEY_MASK_LR | + MLX5_MKEY_MASK_LW | + MLX5_MKEY_MASK_RR | + MLX5_MKEY_MASK_RW | + MLX5_MKEY_MASK_SMALL_FENCE | + MLX5_MKEY_MASK_FREE | + MLX5_MKEY_MASK_BSF_EN; + + return cpu_to_be64(result); +} + static void set_frwr_umr_segment(struct mlx5_wqe_umr_ctrl_seg *umr, struct ib_send_wr *wr, int li) { @@ -1904,6 +1924,360 @@ static int set_data_inl_seg(struct mlx5_ib_qp *qp, struct ib_send_wr *wr, return 0; } +static u16 prot_field_size(enum ib_signature_type type) +{ + switch (type) { + case IB_SIG_TYPE_T10_DIF: + return MLX5_DIF_SIZE; + default: + return 0; + } +} + +static int bs_selector(u32 block_size, u8 *selector) +{ + switch (block_size) { + case 512: + *selector = 0x1; + break; + case 520: + *selector = 0x2; + break; + case 4096: + *selector = 0x3; + break; + case 4160: + *selector = 0x4; + break; + case 1073741824: + *selector = 0x5; + break; + default: + return -EINVAL; + } + return 0; +} + +static int format_selector(struct ib_sig_attrs *attr, + struct ib_sig_domain *domain, + int *selector) +{ + +#define FORMAT_DIF_NONE0 +#define FORMAT_DIF_CRC_INC 4 +#define FORMAT_DIF_CSUM_INC12 +#define FORMAT_DIF_CRC_NO_INC 13 +#define FORMAT_DIF_CSUM_NO_INC 14 + + switch (domain-sig.dif.type) { + case IB_T10DIF_NONE: + /* No DIF */ + *selector = FORMAT_DIF_NONE; + break; + case IB_T10DIF_TYPE1: /* Fall through */ + case IB_T10DIF_TYPE2: + switch (domain-sig.dif.bg_type) { + case IB_T10DIF_CRC: + *selector = FORMAT_DIF_CRC_INC; + break; + case IB_T10DIF_CSUM: + *selector = FORMAT_DIF_CSUM_INC; + break; + default: + return 1; + } + break; + case IB_T10DIF_TYPE3: + switch (domain-sig.dif.bg_type) { + case IB_T10DIF_CRC: + *selector = domain-sig.dif.type3_inc_reftag ? + FORMAT_DIF_CRC_INC : + FORMAT_DIF_CRC_NO_INC; + break; + case IB_T10DIF_CSUM: + *selector = domain-sig.dif.type3_inc_reftag ? + FORMAT_DIF_CSUM_INC
[PATCH v4 06/10] IB/mlx5: remove MTT access mode from umr flags helper function
get_umr_flags helper function might be used for types of access modes other than ACCESS_MODE_MTT, such as ACCESS_MODE_KLM. so remove it from helper and caller will add it's own access mode flag. This commit does not add/change functionality. Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/hw/mlx5/qp.c |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index e135d71..1329c10 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -1775,7 +1775,7 @@ static u8 get_umr_flags(int acc) (acc IB_ACCESS_REMOTE_WRITE ? MLX5_PERM_REMOTE_WRITE : 0) | (acc IB_ACCESS_REMOTE_READ ? MLX5_PERM_REMOTE_READ : 0) | (acc IB_ACCESS_LOCAL_WRITE ? MLX5_PERM_LOCAL_WRITE : 0) | - MLX5_PERM_LOCAL_READ | MLX5_PERM_UMR_EN | MLX5_ACCESS_MODE_MTT; + MLX5_PERM_LOCAL_READ | MLX5_PERM_UMR_EN; } static void set_mkey_segment(struct mlx5_mkey_seg *seg, struct ib_send_wr *wr, @@ -1787,7 +1787,8 @@ static void set_mkey_segment(struct mlx5_mkey_seg *seg, struct ib_send_wr *wr, return; } - seg-flags = get_umr_flags(wr-wr.fast_reg.access_flags); + seg-flags = get_umr_flags(wr-wr.fast_reg.access_flags) | +MLX5_ACCESS_MODE_MTT; *writ = seg-flags (MLX5_PERM_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE); seg-qpn_mkey7_0 = cpu_to_be32((wr-wr.fast_reg.rkey 0xff) | 0xff00); seg-flags_pd = cpu_to_be32(MLX5_MKEY_REMOTE_INVAL); -- 1.7.8.2 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Native IB connection setup.
On 1/2/2014 10:11 AM, Ilya Kalistru wrote: Happy New Year, ladies and gentlemen! I'm developing some sort of hardware InfiniBand server runing on FPGA and delivering some data to PC using RDMA_WRITE operation. I've already had Physical Link Up and Logical Link Up between my device and PC with Mellanox HCA. I see GUID and LID of my device when I runing ibstatus or ibnetdiscover command on PC and therefore I think subnet configuration is ok. Now I have a problem with connection setup. Because I'm only who is developing this device and it's a problem to add extra protocols in FPGA firmware I don't want to use any something like getaddrinfo() (they use IPoIB)... I'm going to use native IB CM REQ/REP/RTU MADs for connection setup, but I don't know how. I think that I should request GUID to LID resolution at first. Like rdma_resolve_addr()/rdma_resolve_route() but from GUID not from IP. Second (I think) I should use ib_send_cm_req() and ib_send_cm_rtu() with well known ServiceID (I select it) to establish connection. I'm not a programmer and have no experience with programming of network based applications and therefore I will be thankful very much if you help me with example of programm code using native IB connection setup technics or any other help. You can have a look in SRP (SCSI RDMA Protocol under drivers/infiniband/ulp/srp) as a reference for native IB connection establishment. P.S. It's my first time I'm using mailing list. I'm sorry, if I'm doing something wrong. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message tomajord...@vger.kernel.org More majordomo info athttp://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux rdma 3.14 merge plans
On 1/8/2014 2:51 AM, Roland Dreier wrote: On Tue, Jan 7, 2014 at 1:02 PM, Or Gerlitz or.gerl...@gmail.com wrote: Currently there is single patch for 3.14 on your for-next branch, the usnic driver. With 3.13 being on rc7 and likely to be released next week, are you planning any other merges for 3.14? we have patches waiting for weeks and months without any comment from you. I am definitely planning on merging the new IBoE IP addressing stuff, since we seem to have solved the ABI issues. The UD flow steering patches seem good and I will take a closer look soon. And there are quite a few usnic patches still to pick up. I'm confident that will all make it. The data integrity stuff I'm not so sure about. Sean raised some I think legitimate questions about whether all this should be added to the verbs API and I want to see more discussion or at least have a deep think about this myself before comitting. Hey Roland, I don't think that Sean didn't question weather data-integrity support should or shouldn't be added to Verbs API (Sean correct me if I'm wrong), but rather the way it should be added. From our discussion on this, the only conflict that Sean and I had was weather the protection setup should ride on ib_post_send. Sean suggested a separate routine that would post on the SQ. I think that in the current framework where placing a fast-path operation is done via ib_post_send, we keep current implementation, and open a discussion if it is a good idea to migrate non-send work-requests out of ib_post_send (also fast-registration and memory-windows). Sagi. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 00/11] iSER target initial support for T10-DIF offload
Hey Nic, MKP, SCSI and RDMA folks, This patchset adds basic support for T10-DIF protection information offload in iSER target on top of Nic's recent work and RDMA signature verbs API. This code was tested with my own implementation of the target core T10-PI support which was designed mainly to activate the transport DIF offload. In order to actually get Linux SCSI target to work with iSER T10-DIF offload a couple of patches needs to be added to Nic's work which is ongoing. Apart from doing the actual iser implementation for T10-DIF offload, this series would help to see the full picture by: * Showing how the T10-DIF offload verbs are used * Showing how fabric transport offload plugs into the target core The T10-DIF signature offload verbs and mlx5 driver implementation patches are available from the for-next branch of git://beany.openfabrics.org/~ogerlitz/linux-2.6.git as the below commits: 2b4316b IB/mlx5: Publish support in signature feature ef3130d IB/mlx5: Collect signature error completion c1b37b1 IB/mlx5: Support IB_WR_REG_SIG_MR f5d8496 IB/mlx5: Keep mlx5 MRs in a radix tree under device 72a72ee IB/mlx5: remove MTT access mode from umr flags helper function ccb0a907 IB/mlx5: Break wqe handling to begin finish routines cda0569 IB/mlx5: Initialize mlx5_ib_qp signature related 33b4079 IB/mlx5, mlx5_core: Support for create_mr and destroy_mr 8b343e6 IB/core: Introduce Signature Verbs API c1b0358 IB/core: Introduce protected memory regions Sagi Grimberg (11): Target/core: Fixes for isert compilation IB/isert: seperate connection protection domains and dma MRs IB/isert: Avoid frwr notation, user fastreg IB/isert: Move fastreg descriptor creation to a function Target/iscsi: Add T10-PI indication for iscsi_portal_group IB/isert: Initialize T10-PI resources IB/isert: pass scatterlist instead of cmd to fast_reg_mr routine IB/isert: pass mr and frpl to isert_fast_reg_mr routine IB/isert: Accept RDMA_WRITE completions IB/isert: Support T10-PI protected transactions Target/configfs: Expose iSCSI network portal group T10-PI support drivers/infiniband/ulp/isert/ib_isert.c | 708 +++-- drivers/infiniband/ulp/isert/ib_isert.h | 29 +- drivers/target/iscsi/iscsi_target_configfs.c |6 + drivers/target/iscsi/iscsi_target_core.h |5 +- drivers/target/iscsi/iscsi_target_tpg.c | 21 + drivers/target/iscsi/iscsi_target_tpg.h |1 + include/target/target_core_base.h| 22 +- 7 files changed, 603 insertions(+), 189 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/11] IB/isert: Avoid frwr notation, user fastreg
Use fast registration lingo. fast registration will also incorporate signature/DIF registration. Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/ulp/isert/ib_isert.c | 84 --- drivers/infiniband/ulp/isert/ib_isert.h |8 ++-- 2 files changed, 47 insertions(+), 45 deletions(-) diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c index 3dd2427..295d2be 100644 --- a/drivers/infiniband/ulp/isert/ib_isert.c +++ b/drivers/infiniband/ulp/isert/ib_isert.c @@ -47,10 +47,10 @@ static int isert_map_rdma(struct iscsi_conn *conn, struct iscsi_cmd *cmd, struct isert_rdma_wr *wr); static void -isert_unreg_rdma_frwr(struct isert_cmd *isert_cmd, struct isert_conn *isert_conn); +isert_unreg_rdma(struct isert_cmd *isert_cmd, struct isert_conn *isert_conn); static int -isert_reg_rdma_frwr(struct iscsi_conn *conn, struct iscsi_cmd *cmd, - struct isert_rdma_wr *wr); +isert_reg_rdma(struct iscsi_conn *conn, struct iscsi_cmd *cmd, + struct isert_rdma_wr *wr); static void isert_qp_event_callback(struct ib_event *e, void *context) @@ -225,11 +225,11 @@ isert_create_device_ib_res(struct isert_device *device) /* asign function handlers */ if (dev_attr-device_cap_flags IB_DEVICE_MEM_MGT_EXTENSIONS) { - device-use_frwr = 1; - device-reg_rdma_mem = isert_reg_rdma_frwr; - device-unreg_rdma_mem = isert_unreg_rdma_frwr; + device-use_fastreg = 1; + device-reg_rdma_mem = isert_reg_rdma; + device-unreg_rdma_mem = isert_unreg_rdma; } else { - device-use_frwr = 0; + device-use_fastreg = 0; device-reg_rdma_mem = isert_map_rdma; device-unreg_rdma_mem = isert_unmap_cmd; } @@ -237,9 +237,10 @@ isert_create_device_ib_res(struct isert_device *device) device-cqs_used = min_t(int, num_online_cpus(), device-ib_device-num_comp_vectors); device-cqs_used = min(ISERT_MAX_CQ, device-cqs_used); - pr_debug(Using %d CQs, device %s supports %d vectors support FRWR %d\n, + pr_debug(Using %d CQs, device %s supports %d vectors support +Fast registration %d\n, device-cqs_used, device-ib_device-name, -device-ib_device-num_comp_vectors, device-use_frwr); +device-ib_device-num_comp_vectors, device-use_fastreg); device-cq_desc = kzalloc(sizeof(struct isert_cq_desc) * device-cqs_used, GFP_KERNEL); if (!device-cq_desc) { @@ -367,18 +368,18 @@ isert_device_find_by_ib_dev(struct rdma_cm_id *cma_id) } static void -isert_conn_free_frwr_pool(struct isert_conn *isert_conn) +isert_conn_free_fastreg_pool(struct isert_conn *isert_conn) { struct fast_reg_descriptor *fr_desc, *tmp; int i = 0; - if (list_empty(isert_conn-conn_frwr_pool)) + if (list_empty(isert_conn-conn_fr_pool)) return; - pr_debug(Freeing conn %p frwr pool, isert_conn); + pr_debug(Freeing conn %p fastreg pool, isert_conn); list_for_each_entry_safe(fr_desc, tmp, -isert_conn-conn_frwr_pool, list) { +isert_conn-conn_fr_pool, list) { list_del(fr_desc-list); ib_free_fast_reg_page_list(fr_desc-data_frpl); ib_dereg_mr(fr_desc-data_mr); @@ -386,20 +387,20 @@ isert_conn_free_frwr_pool(struct isert_conn *isert_conn) ++i; } - if (i isert_conn-conn_frwr_pool_size) + if (i isert_conn-conn_fr_pool_size) pr_warn(Pool still has %d regions registered\n, - isert_conn-conn_frwr_pool_size - i); + isert_conn-conn_fr_pool_size - i); } static int -isert_conn_create_frwr_pool(struct isert_conn *isert_conn) +isert_conn_create_fastreg_pool(struct isert_conn *isert_conn) { struct fast_reg_descriptor *fr_desc; struct isert_device *device = isert_conn-conn_device; int i, ret; - INIT_LIST_HEAD(isert_conn-conn_frwr_pool); - isert_conn-conn_frwr_pool_size = 0; + INIT_LIST_HEAD(isert_conn-conn_fr_pool); + isert_conn-conn_fr_pool_size = 0; for (i = 0; i ISCSI_DEF_XMIT_CMDS_MAX; i++) { fr_desc = kzalloc(sizeof(*fr_desc), GFP_KERNEL); if (!fr_desc) { @@ -431,17 +432,17 @@ isert_conn_create_frwr_pool(struct isert_conn *isert_conn) fr_desc, fr_desc-data_frpl-page_list); fr_desc-valid = true; - list_add_tail(fr_desc-list, isert_conn-conn_frwr_pool); - isert_conn-conn_frwr_pool_size++; + list_add_tail(fr_desc-list, isert_conn-conn_fr_pool); + isert_conn-conn_fr_pool_size
[PATCH 07/11] IB/isert: pass scatterlist instead of cmd to fast_reg_mr routine
This routine may help for protection registration as well. This patch does not change any functionality. Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/ulp/isert/ib_isert.c | 28 1 files changed, 12 insertions(+), 16 deletions(-) diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c index 98f23f4..3495e73 100644 --- a/drivers/infiniband/ulp/isert/ib_isert.c +++ b/drivers/infiniband/ulp/isert/ib_isert.c @@ -2247,26 +2247,22 @@ isert_map_fr_pagelist(struct ib_device *ib_dev, static int isert_fast_reg_mr(struct fast_reg_descriptor *fr_desc, - struct isert_cmd *isert_cmd, struct isert_conn *isert_conn, - struct ib_sge *ib_sge, u32 offset, unsigned int data_len) + struct isert_conn *isert_conn, struct scatterlist *sg_start, + struct ib_sge *ib_sge, u32 sg_nents, u32 offset, + unsigned int data_len) { - struct iscsi_cmd *cmd = isert_cmd-iscsi_cmd; struct ib_device *ib_dev = isert_conn-conn_cm_id-device; - struct scatterlist *sg_start; - u32 sg_off, page_off; struct ib_send_wr fr_wr, inv_wr; struct ib_send_wr *bad_wr, *wr = NULL; + int ret, pagelist_len; + u32 page_off; u8 key; - int ret, sg_nents, pagelist_len; - sg_off = offset / PAGE_SIZE; - sg_start = cmd-se_cmd.t_data_sg[sg_off]; - sg_nents = min_t(unsigned int, cmd-se_cmd.t_data_nents - sg_off, -ISCSI_ISER_SG_TABLESIZE); + sg_nents = min_t(unsigned int, sg_nents, ISCSI_ISER_SG_TABLESIZE); page_off = offset % PAGE_SIZE; - pr_debug(Cmd: %p use fr_desc %p sg_nents %d sg_off %d offset %u\n, -isert_cmd, fr_desc, sg_nents, sg_off, offset); + pr_debug(Use fr_desc %p sg_nents %d offset %u\n, +fr_desc, sg_nents, offset); pagelist_len = isert_map_fr_pagelist(ib_dev, sg_start, sg_nents, fr_desc-data_frpl-page_list[0]); @@ -2335,9 +2331,9 @@ isert_reg_rdma(struct iscsi_conn *conn, struct iscsi_cmd *cmd, if (wr-iser_ib_op == ISER_IB_RDMA_WRITE) { data_left = se_cmd-data_length; } else { - sg_off = cmd-write_data_done / PAGE_SIZE; - data_left = se_cmd-data_length - cmd-write_data_done; offset = cmd-write_data_done; + sg_off = offset / PAGE_SIZE; + data_left = se_cmd-data_length - cmd-write_data_done; isert_cmd-tx_desc.isert_cmd = isert_cmd; } @@ -2401,8 +2397,8 @@ isert_reg_rdma(struct iscsi_conn *conn, struct iscsi_cmd *cmd, spin_unlock_irqrestore(isert_conn-conn_lock, flags); wr-fr_desc = fr_desc; - ret = isert_fast_reg_mr(fr_desc, isert_cmd, isert_conn, - ib_sge, offset, data_len); + ret = isert_fast_reg_mr(fr_desc, isert_conn, sg_start, + ib_sge, sg_nents, offset, data_len); if (ret) { list_add_tail(fr_desc-list, isert_conn-conn_fr_pool); goto unmap_sg; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 02/11] IB/isert: seperate connection protection domains and dma MRs
It is more correct to seperate connections protection domains and dma_mr handles. protection information support requires to do so. Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/ulp/isert/ib_isert.c | 46 --- drivers/infiniband/ulp/isert/ib_isert.h |2 - 2 files changed, 24 insertions(+), 24 deletions(-) diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c index 6be57c3..3dd2427 100644 --- a/drivers/infiniband/ulp/isert/ib_isert.c +++ b/drivers/infiniband/ulp/isert/ib_isert.c @@ -248,13 +248,6 @@ isert_create_device_ib_res(struct isert_device *device) } cq_desc = device-cq_desc; - device-dev_pd = ib_alloc_pd(ib_dev); - if (IS_ERR(device-dev_pd)) { - ret = PTR_ERR(device-dev_pd); - pr_err(ib_alloc_pd failed for dev_pd: %d\n, ret); - goto out_cq_desc; - } - for (i = 0; i device-cqs_used; i++) { cq_desc[i].device = device; cq_desc[i].cq_index = i; @@ -282,13 +275,6 @@ isert_create_device_ib_res(struct isert_device *device) goto out_cq; } - device-dev_mr = ib_get_dma_mr(device-dev_pd, IB_ACCESS_LOCAL_WRITE); - if (IS_ERR(device-dev_mr)) { - ret = PTR_ERR(device-dev_mr); - pr_err(ib_get_dma_mr failed for dev_mr: %d\n, ret); - goto out_cq; - } - return 0; out_cq: @@ -304,9 +290,6 @@ out_cq: ib_destroy_cq(device-dev_tx_cq[j]); } } - ib_dealloc_pd(device-dev_pd); - -out_cq_desc: kfree(device-cq_desc); return ret; @@ -329,8 +312,6 @@ isert_free_device_ib_res(struct isert_device *device) device-dev_tx_cq[i] = NULL; } - ib_dereg_mr(device-dev_mr); - ib_dealloc_pd(device-dev_pd); kfree(device-cq_desc); } @@ -437,7 +418,7 @@ isert_conn_create_frwr_pool(struct isert_conn *isert_conn) goto err; } - fr_desc-data_mr = ib_alloc_fast_reg_mr(device-dev_pd, + fr_desc-data_mr = ib_alloc_fast_reg_mr(isert_conn-conn_pd, ISCSI_ISER_SG_TABLESIZE); if (IS_ERR(fr_desc-data_mr)) { pr_err(Failed to allocate frmr err=%ld\n, @@ -546,8 +527,22 @@ isert_connect_request(struct rdma_cm_id *cma_id, struct rdma_cm_event *event) } isert_conn-conn_device = device; - isert_conn-conn_pd = device-dev_pd; - isert_conn-conn_mr = device-dev_mr; + isert_conn-conn_pd = ib_alloc_pd(isert_conn-conn_device-ib_device); + if (IS_ERR(isert_conn-conn_pd)) { + ret = PTR_ERR(isert_conn-conn_pd); + pr_err(ib_alloc_pd failed for conn %p: ret=%d\n, + isert_conn, ret); + goto out_pd; + } + + isert_conn-conn_mr = ib_get_dma_mr(isert_conn-conn_pd, + IB_ACCESS_LOCAL_WRITE); + if (IS_ERR(isert_conn-conn_mr)) { + ret = PTR_ERR(isert_conn-conn_mr); + pr_err(ib_get_dma_mr failed for conn %p: ret=%d\n, + isert_conn, ret); + goto out_mr; + } if (device-use_frwr) { ret = isert_conn_create_frwr_pool(isert_conn); @@ -573,6 +568,10 @@ out_conn_dev: if (device-use_frwr) isert_conn_free_frwr_pool(isert_conn); out_frwr: + ib_dereg_mr(isert_conn-conn_mr); +out_mr: + ib_dealloc_pd(isert_conn-conn_pd); +out_pd: isert_device_try_release(device); out_rsp_dma_map: ib_dma_unmap_single(ib_dev, isert_conn-login_rsp_dma, @@ -611,6 +610,9 @@ isert_connect_release(struct isert_conn *isert_conn) isert_free_rx_descriptors(isert_conn); rdma_destroy_id(isert_conn-conn_cm_id); + ib_dereg_mr(isert_conn-conn_mr); + ib_dealloc_pd(isert_conn-conn_pd); + if (isert_conn-login_buf) { ib_dma_unmap_single(ib_dev, isert_conn-login_rsp_dma, ISER_RX_LOGIN_SIZE, DMA_TO_DEVICE); diff --git a/drivers/infiniband/ulp/isert/ib_isert.h b/drivers/infiniband/ulp/isert/ib_isert.h index 691f90f..dec74d4 100644 --- a/drivers/infiniband/ulp/isert/ib_isert.h +++ b/drivers/infiniband/ulp/isert/ib_isert.h @@ -144,8 +144,6 @@ struct isert_device { int refcount; int cq_active_qps[ISERT_MAX_CQ]; struct ib_device*ib_device; - struct ib_pd*dev_pd; - struct ib_mr*dev_mr; struct ib_cq*dev_rx_cq[ISERT_MAX_CQ]; struct ib_cq*dev_tx_cq[ISERT_MAX_CQ]; struct isert_cq_desc*cq_desc; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message
[PATCH 01/11] Target/core: Fixes for isert compilation
replace prot_interleaved with prot_handover in se_cmd. Signed-off-by: Sagi Grimberg sa...@mellanox.com --- include/target/target_core_base.h | 22 ++ 1 files changed, 14 insertions(+), 8 deletions(-) diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h index 13daea5..2ae304d 100644 --- a/include/target/target_core_base.h +++ b/include/target/target_core_base.h @@ -439,14 +439,20 @@ struct se_tmr_req { struct list_headtmr_list; }; +#define TARGET_DIF_SIZE 8 enum target_prot_op { - TARGET_PROT_NORMAL, - TARGET_PROT_READ_INSERT, - TARGET_PROT_WRITE_INSERT, - TARGET_PROT_READ_STRIP, - TARGET_PROT_WRITE_STRIP, - TARGET_PROT_READ_PASS, - TARGET_PROT_WRITE_PASS, + TARGET_PROT_NORMAL = 0, + TARGET_PROT_DIN_INSERT, + TARGET_PROT_DOUT_INSERT, + TARGET_PROT_DIN_STRIP, + TARGET_PROT_DOUT_STRIP, + TARGET_PROT_DIN_PASS, + TARGET_PROT_DOUT_PASS +}; + +enum target_prot_ho { + PROT_SEPERATED, + PROT_INTERLEAVED, }; enum target_prot_type { @@ -573,7 +579,7 @@ struct se_cmd { u32 prot_length; struct scatterlist *t_prot_sg; unsigned intt_prot_nents; - boolprot_interleaved; + enum target_prot_ho prot_handover; enum target_pi_errorpi_err; u32 block_num; }; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 08/11] IB/isert: pass mr and frpl to isert_fast_reg_mr routine
This commit generalizes isert_fast_reg_mr to receive mr and frpl instead of fr_desc to do registration. In T10-PI we also register protection memory region so we want to use this routine. This commit does not change any functionality. Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/ulp/isert/ib_isert.c | 62 +++ 1 files changed, 30 insertions(+), 32 deletions(-) diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c index 3495e73..98aab21 100644 --- a/drivers/infiniband/ulp/isert/ib_isert.c +++ b/drivers/infiniband/ulp/isert/ib_isert.c @@ -2246,10 +2246,10 @@ isert_map_fr_pagelist(struct ib_device *ib_dev, } static int -isert_fast_reg_mr(struct fast_reg_descriptor *fr_desc, - struct isert_conn *isert_conn, struct scatterlist *sg_start, - struct ib_sge *ib_sge, u32 sg_nents, u32 offset, - unsigned int data_len) +isert_fast_reg_mr(struct isert_conn *isert_conn, struct ib_mr *mr, + struct ib_fast_reg_page_list *frpl, bool *key_valid, + struct scatterlist *sg_start, u32 sg_nents, u32 offset, + unsigned int data_len, struct ib_sge *ib_sge) { struct ib_device *ib_dev = isert_conn-conn_cm_id-device; struct ib_send_wr fr_wr, inv_wr; @@ -2260,33 +2260,31 @@ isert_fast_reg_mr(struct fast_reg_descriptor *fr_desc, sg_nents = min_t(unsigned int, sg_nents, ISCSI_ISER_SG_TABLESIZE); page_off = offset % PAGE_SIZE; - - pr_debug(Use fr_desc %p sg_nents %d offset %u\n, -fr_desc, sg_nents, offset); + pr_debug(Use mr %p frpl %p sg_nents %d offset %u\n, +mr, frpl, sg_nents, offset); pagelist_len = isert_map_fr_pagelist(ib_dev, sg_start, sg_nents, -fr_desc-data_frpl-page_list[0]); +frpl-page_list[0]); - if (!fr_desc-data_key_valid) { + if (!*key_valid) { memset(inv_wr, 0, sizeof(inv_wr)); inv_wr.opcode = IB_WR_LOCAL_INV; - inv_wr.ex.invalidate_rkey = fr_desc-data_mr-rkey; + inv_wr.ex.invalidate_rkey = mr-rkey; wr = inv_wr; /* Bump the key */ - key = (u8)(fr_desc-data_mr-rkey 0x00FF); - ib_update_fast_reg_key(fr_desc-data_mr, ++key); + key = (u8)(mr-rkey 0x00FF); + ib_update_fast_reg_key(mr, ++key); } /* Prepare FASTREG WR */ memset(fr_wr, 0, sizeof(fr_wr)); fr_wr.opcode = IB_WR_FAST_REG_MR; - fr_wr.wr.fast_reg.iova_start = - fr_desc-data_frpl-page_list[0] + page_off; - fr_wr.wr.fast_reg.page_list = fr_desc-data_frpl; + fr_wr.wr.fast_reg.iova_start = frpl-page_list[0] + page_off; + fr_wr.wr.fast_reg.page_list = frpl; fr_wr.wr.fast_reg.page_list_len = pagelist_len; fr_wr.wr.fast_reg.page_shift = PAGE_SHIFT; fr_wr.wr.fast_reg.length = data_len; - fr_wr.wr.fast_reg.rkey = fr_desc-data_mr-rkey; + fr_wr.wr.fast_reg.rkey = mr-rkey; fr_wr.wr.fast_reg.access_flags = IB_ACCESS_LOCAL_WRITE; if (!wr) @@ -2299,14 +2297,14 @@ isert_fast_reg_mr(struct fast_reg_descriptor *fr_desc, pr_err(fast registration failed, ret:%d\n, ret); return ret; } - fr_desc-data_key_valid = false; - ib_sge-lkey = fr_desc-data_mr-lkey; - ib_sge-addr = fr_desc-data_frpl-page_list[0] + page_off; + *key_valid = false; + ib_sge-lkey = mr-lkey; + ib_sge-addr = frpl-page_list[0] + page_off; ib_sge-length = data_len; - pr_debug(RDMA ib_sge: addr: 0x%16llx length: %u lkey: %08x\n, -ib_sge-addr, ib_sge-length, ib_sge-lkey); + pr_debug(fastreg ib_sge: addr: 0x%16llx length: %u lkey: %08x\n, +ib_sge-addr + page_off, ib_sge-length, ib_sge-lkey); return ret; } @@ -2320,7 +2318,7 @@ isert_reg_rdma(struct iscsi_conn *conn, struct iscsi_cmd *cmd, struct isert_conn *isert_conn = (struct isert_conn *)conn-context; struct ib_device *ib_dev = isert_conn-conn_cm_id-device; struct ib_send_wr *send_wr; - struct ib_sge *ib_sge; + struct ib_sge data_sge; struct scatterlist *sg_start; struct fast_reg_descriptor *fr_desc; u32 sg_off = 0, sg_nents; @@ -2352,10 +2350,7 @@ isert_reg_rdma(struct iscsi_conn *conn, struct iscsi_cmd *cmd, pr_debug(Mapped cmd: %p count: %u sg: %p sg_nents: %u rdma_len %d\n, isert_cmd, count, sg_start, sg_nents, data_left); - memset(wr-s_ib_sge, 0, sizeof(*ib_sge)); - ib_sge = wr-s_ib_sge; - wr-ib_sge = ib_sge; - + wr-ib_sge = wr-s_ib_sge; wr-send_wr_num = 1; memset(wr-s_send_wr, 0, sizeof(*send_wr)); wr-send_wr = wr-s_send_wr
[PATCH 06/11] IB/isert: Initialize T10-PI resources
Upon connection establishment check if network portal is T10-PI enabled and allocate T10-PI resources if necessary, allocate signature enabled memory regions and mark connection queue-pair as signature enabled. Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/ulp/isert/ib_isert.c | 104 +++ drivers/infiniband/ulp/isert/ib_isert.h | 19 +- 2 files changed, 106 insertions(+), 17 deletions(-) diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c index 9ef9193..98f23f4 100644 --- a/drivers/infiniband/ulp/isert/ib_isert.c +++ b/drivers/infiniband/ulp/isert/ib_isert.c @@ -87,7 +87,8 @@ isert_query_device(struct ib_device *ib_dev, struct ib_device_attr *devattr) } static int -isert_conn_setup_qp(struct isert_conn *isert_conn, struct rdma_cm_id *cma_id) +isert_conn_setup_qp(struct isert_conn *isert_conn, struct rdma_cm_id *cma_id, + u8 protection) { struct isert_device *device = isert_conn-conn_device; struct ib_qp_init_attr attr; @@ -119,6 +120,8 @@ isert_conn_setup_qp(struct isert_conn *isert_conn, struct rdma_cm_id *cma_id) attr.cap.max_recv_sge = 1; attr.sq_sig_type = IB_SIGNAL_REQ_WR; attr.qp_type = IB_QPT_RC; + if (protection) + attr.create_flags |= IB_QP_CREATE_SIGNATURE_EN; pr_debug(isert_conn_setup_qp cma_id-device: %p\n, cma_id-device); @@ -234,13 +237,18 @@ isert_create_device_ib_res(struct isert_device *device) device-unreg_rdma_mem = isert_unmap_cmd; } + /* Check signature cap */ + device-pi_capable = dev_attr-device_cap_flags +IB_DEVICE_SIGNATURE_HANDOVER ? true : false; + device-cqs_used = min_t(int, num_online_cpus(), device-ib_device-num_comp_vectors); device-cqs_used = min(ISERT_MAX_CQ, device-cqs_used); pr_debug(Using %d CQs, device %s supports %d vectors support -Fast registration %d\n, +Fast registration %d pi_capable %d\n, device-cqs_used, device-ib_device-name, -device-ib_device-num_comp_vectors, device-use_fastreg); +device-ib_device-num_comp_vectors, device-use_fastreg, +device-pi_capable); device-cq_desc = kzalloc(sizeof(struct isert_cq_desc) * device-cqs_used, GFP_KERNEL); if (!device-cq_desc) { @@ -383,6 +391,12 @@ isert_conn_free_fastreg_pool(struct isert_conn *isert_conn) list_del(fr_desc-list); ib_free_fast_reg_page_list(fr_desc-data_frpl); ib_dereg_mr(fr_desc-data_mr); + if (fr_desc-pi_ctx) { + ib_free_fast_reg_page_list(fr_desc-pi_ctx-prot_frpl); + ib_dereg_mr(fr_desc-pi_ctx-prot_mr); + ib_destroy_mr(fr_desc-pi_ctx-sig_mr); + kfree(fr_desc-pi_ctx); + } kfree(fr_desc); ++i; } @@ -394,8 +408,10 @@ isert_conn_free_fastreg_pool(struct isert_conn *isert_conn) static int isert_create_fr_desc(struct ib_device *ib_device, struct ib_pd *pd, -struct fast_reg_descriptor *fr_desc) +struct fast_reg_descriptor *fr_desc, u8 protection) { + int ret; + fr_desc-data_frpl = ib_alloc_fast_reg_page_list(ib_device, ISCSI_ISER_SG_TABLESIZE); if (IS_ERR(fr_desc-data_frpl)) { @@ -408,19 +424,73 @@ isert_create_fr_desc(struct ib_device *ib_device, struct ib_pd *pd, if (IS_ERR(fr_desc-data_mr)) { pr_err(Failed to allocate data frmr err=%ld\n, PTR_ERR(fr_desc-data_mr)); - ib_free_fast_reg_page_list(fr_desc-data_frpl); - return PTR_ERR(fr_desc-data_mr); + ret = PTR_ERR(fr_desc-data_mr); + goto err_data_frpl; } pr_debug(Create fr_desc %p page_list %p\n, fr_desc, fr_desc-data_frpl-page_list); + fr_desc-data_key_valid = true; - fr_desc-valid = true; + if (protection) { + struct ib_mr_init_attr mr_init_attr = {0}; + struct pi_context *pi_ctx; + + fr_desc-pi_ctx = kzalloc(sizeof(*fr_desc-pi_ctx), GFP_KERNEL); + if (!fr_desc-pi_ctx) { + pr_err(Failed to allocate pi context\n); + ret = -ENOMEM; + goto err_data_mr; + } + pi_ctx = fr_desc-pi_ctx; + + pi_ctx-prot_frpl = ib_alloc_fast_reg_page_list(ib_device, + ISCSI_ISER_SG_TABLESIZE); + if (IS_ERR(pi_ctx-prot_frpl)) { + pr_err(Failed to allocate prot frpl err=%ld\n
[PATCH 09/11] IB/isert: Accept RDMA_WRITE completions
In case of protected transactions, we will need to check the protection status of the transaction before sending SCSI response. So be ready for RDMA_WRITE completions. currently we don't ask for these completions, but for T10-PI we will. Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/ulp/isert/ib_isert.c | 20 +--- 1 files changed, 17 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c index 98aab21..9aa933e 100644 --- a/drivers/infiniband/ulp/isert/ib_isert.c +++ b/drivers/infiniband/ulp/isert/ib_isert.c @@ -51,6 +51,8 @@ isert_unreg_rdma(struct isert_cmd *isert_cmd, struct isert_conn *isert_conn); static int isert_reg_rdma(struct iscsi_conn *conn, struct iscsi_cmd *cmd, struct isert_rdma_wr *wr); +static int +isert_put_response(struct iscsi_conn *conn, struct iscsi_cmd *cmd); static void isert_qp_event_callback(struct ib_event *e, void *context) @@ -1602,6 +1604,18 @@ isert_completion_put(struct iser_tx_desc *tx_desc, struct isert_cmd *isert_cmd, } static void +isert_completion_rdma_write(struct iser_tx_desc *tx_desc, + struct isert_cmd *isert_cmd) +{ + struct iscsi_cmd *cmd = isert_cmd-iscsi_cmd; + struct isert_conn *isert_conn = isert_cmd-conn; + struct isert_device *device = isert_conn-conn_device; + + device-unreg_rdma_mem(isert_cmd, isert_conn); + isert_put_response(isert_conn-conn, cmd); +} + +static void isert_completion_rdma_read(struct iser_tx_desc *tx_desc, struct isert_cmd *isert_cmd) { @@ -1721,9 +1735,9 @@ __isert_send_completion(struct iser_tx_desc *tx_desc, isert_conn, ib_dev); break; case ISER_IB_RDMA_WRITE: - pr_err(isert_send_completion: Got ISER_IB_RDMA_WRITE\n); - dump_stack(); - break; + pr_debug(isert_send_completion: Got ISER_IB_RDMA_WRITE\n); + atomic_dec(isert_conn-post_send_buf_count); + isert_completion_rdma_write(tx_desc, isert_cmd); case ISER_IB_RDMA_READ: pr_debug(isert_send_completion: Got ISER_IB_RDMA_READ:\n); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/11] Target/iscsi: Add T10-PI indication for iscsi_portal_group
In case an iscsi portal group will be defined as t10_pi enabled, all connections on top of it will support protected transactions. T10-PI support may require extra reource allocation and maintenance by the transport layer, so we don't want to apply them on non-t10_pi network portals. This is a hook for the iscsi target layer to signal the transport at connection establishment that this connection will carry protected transactions. Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/target/iscsi/iscsi_target_core.h |5 - drivers/target/iscsi/iscsi_target_tpg.c |2 ++ 2 files changed, 6 insertions(+), 1 deletions(-) diff --git a/drivers/target/iscsi/iscsi_target_core.h b/drivers/target/iscsi/iscsi_target_core.h index 48f7b3b..886d74d 100644 --- a/drivers/target/iscsi/iscsi_target_core.h +++ b/drivers/target/iscsi/iscsi_target_core.h @@ -58,7 +58,8 @@ #define TA_DEMO_MODE_DISCOVERY 1 #define TA_DEFAULT_ERL 0 #define TA_CACHE_CORE_NPS 0 - +/* T10 protection information disabled by default */ +#define TA_DEFAULT_T10_PI 0 #define ISCSI_IOV_DATA_BUFFER 5 @@ -765,6 +766,7 @@ struct iscsi_tpg_attrib { u32 prod_mode_write_protect; u32 demo_mode_discovery; u32 default_erl; + u8 t10_pi; struct iscsi_portal_group *tpg; }; @@ -787,6 +789,7 @@ struct iscsi_np { void*np_context; struct iscsit_transport *np_transport; struct list_headnp_list; + struct iscsi_tpg_np *tpg_np; } cacheline_aligned; struct iscsi_tpg_np { diff --git a/drivers/target/iscsi/iscsi_target_tpg.c b/drivers/target/iscsi/iscsi_target_tpg.c index 3976183..80ae14c 100644 --- a/drivers/target/iscsi/iscsi_target_tpg.c +++ b/drivers/target/iscsi/iscsi_target_tpg.c @@ -225,6 +225,7 @@ static void iscsit_set_default_tpg_attribs(struct iscsi_portal_group *tpg) a-prod_mode_write_protect = TA_PROD_MODE_WRITE_PROTECT; a-demo_mode_discovery = TA_DEMO_MODE_DISCOVERY; a-default_erl = TA_DEFAULT_ERL; + a-t10_pi = TA_DEFAULT_T10_PI; } int iscsit_tpg_add_portal_group(struct iscsi_tiqn *tiqn, struct iscsi_portal_group *tpg) @@ -500,6 +501,7 @@ struct iscsi_tpg_np *iscsit_tpg_add_network_portal( init_completion(tpg_np-tpg_np_comp); kref_init(tpg_np-tpg_np_kref); tpg_np-tpg_np = np; + np-tpg_np = tpg_np; tpg_np-tpg = tpg; spin_lock(tpg-tpg_np_lock); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 11/11] Target/configfs: Expose iSCSI network portal group T10-PI support
User may enable T10-PI support per network portal group. any connection established on top of it, will be required to serve protected transactions. Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/target/iscsi/iscsi_target_configfs.c |6 ++ drivers/target/iscsi/iscsi_target_tpg.c | 19 +++ drivers/target/iscsi/iscsi_target_tpg.h |1 + 3 files changed, 26 insertions(+), 0 deletions(-) diff --git a/drivers/target/iscsi/iscsi_target_configfs.c b/drivers/target/iscsi/iscsi_target_configfs.c index e3318ed..8f3f585 100644 --- a/drivers/target/iscsi/iscsi_target_configfs.c +++ b/drivers/target/iscsi/iscsi_target_configfs.c @@ -1051,6 +1051,11 @@ TPG_ATTR(demo_mode_discovery, S_IRUGO | S_IWUSR); */ DEF_TPG_ATTRIB(default_erl); TPG_ATTR(default_erl, S_IRUGO | S_IWUSR); +/* + * Define iscsi_tpg_attrib_s_t10_pi + */ +DEF_TPG_ATTRIB(t10_pi); +TPG_ATTR(t10_pi, S_IRUGO | S_IWUSR); static struct configfs_attribute *lio_target_tpg_attrib_attrs[] = { iscsi_tpg_attrib_authentication.attr, @@ -1063,6 +1068,7 @@ static struct configfs_attribute *lio_target_tpg_attrib_attrs[] = { iscsi_tpg_attrib_prod_mode_write_protect.attr, iscsi_tpg_attrib_demo_mode_discovery.attr, iscsi_tpg_attrib_default_erl.attr, + iscsi_tpg_attrib_t10_pi.attr, NULL, }; diff --git a/drivers/target/iscsi/iscsi_target_tpg.c b/drivers/target/iscsi/iscsi_target_tpg.c index 80ae14c..d95a5f2 100644 --- a/drivers/target/iscsi/iscsi_target_tpg.c +++ b/drivers/target/iscsi/iscsi_target_tpg.c @@ -860,3 +860,22 @@ int iscsit_ta_default_erl( return 0; } + +int iscsit_ta_t10_pi( + struct iscsi_portal_group *tpg, + u32 flag) +{ + struct iscsi_tpg_attrib *a = tpg-tpg_attrib; + + if ((flag != 0) (flag != 1)) { + pr_err(Illegal value %d\n, flag); + return -EINVAL; + } + + a-t10_pi = flag; + pr_debug(iSCSI_TPG[%hu] - T10 Protection information bit: +%s\n, tpg-tpgt, (a-t10_pi) ? + ON : OFF); + + return 0; +} diff --git a/drivers/target/iscsi/iscsi_target_tpg.h b/drivers/target/iscsi/iscsi_target_tpg.h index 213c0fc..0a182f2 100644 --- a/drivers/target/iscsi/iscsi_target_tpg.h +++ b/drivers/target/iscsi/iscsi_target_tpg.h @@ -39,5 +39,6 @@ extern int iscsit_ta_demo_mode_write_protect(struct iscsi_portal_group *, u32); extern int iscsit_ta_prod_mode_write_protect(struct iscsi_portal_group *, u32); extern int iscsit_ta_demo_mode_discovery(struct iscsi_portal_group *, u32); extern int iscsit_ta_default_erl(struct iscsi_portal_group *, u32); +extern int iscsit_ta_t10_pi(struct iscsi_portal_group *, u32); #endif /* ISCSI_TARGET_TPG_H */ -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 04/11] IB/isert: Move fastreg descriptor creation to a function
This routine may be called both by fast registration descriptors for data and for integrity buffers. This patch does not change any functionality. Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/ulp/isert/ib_isert.c | 52 +++ 1 files changed, 32 insertions(+), 20 deletions(-) diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c index 295d2be..9ef9193 100644 --- a/drivers/infiniband/ulp/isert/ib_isert.c +++ b/drivers/infiniband/ulp/isert/ib_isert.c @@ -393,6 +393,33 @@ isert_conn_free_fastreg_pool(struct isert_conn *isert_conn) } static int +isert_create_fr_desc(struct ib_device *ib_device, struct ib_pd *pd, +struct fast_reg_descriptor *fr_desc) +{ + fr_desc-data_frpl = ib_alloc_fast_reg_page_list(ib_device, + ISCSI_ISER_SG_TABLESIZE); + if (IS_ERR(fr_desc-data_frpl)) { + pr_err(Failed to allocate data frpl err=%ld\n, + PTR_ERR(fr_desc-data_frpl)); + return PTR_ERR(fr_desc-data_frpl); + } + + fr_desc-data_mr = ib_alloc_fast_reg_mr(pd, ISCSI_ISER_SG_TABLESIZE); + if (IS_ERR(fr_desc-data_mr)) { + pr_err(Failed to allocate data frmr err=%ld\n, + PTR_ERR(fr_desc-data_mr)); + ib_free_fast_reg_page_list(fr_desc-data_frpl); + return PTR_ERR(fr_desc-data_mr); + } + pr_debug(Create fr_desc %p page_list %p\n, +fr_desc, fr_desc-data_frpl-page_list); + + fr_desc-valid = true; + + return 0; +} + +static int isert_conn_create_fastreg_pool(struct isert_conn *isert_conn) { struct fast_reg_descriptor *fr_desc; @@ -409,29 +436,14 @@ isert_conn_create_fastreg_pool(struct isert_conn *isert_conn) goto err; } - fr_desc-data_frpl = - ib_alloc_fast_reg_page_list(device-ib_device, - ISCSI_ISER_SG_TABLESIZE); - if (IS_ERR(fr_desc-data_frpl)) { - pr_err(Failed to allocate fr_pg_list err=%ld\n, - PTR_ERR(fr_desc-data_frpl)); - ret = PTR_ERR(fr_desc-data_frpl); - goto err; - } - - fr_desc-data_mr = ib_alloc_fast_reg_mr(isert_conn-conn_pd, - ISCSI_ISER_SG_TABLESIZE); - if (IS_ERR(fr_desc-data_mr)) { - pr_err(Failed to allocate frmr err=%ld\n, - PTR_ERR(fr_desc-data_mr)); - ret = PTR_ERR(fr_desc-data_mr); - ib_free_fast_reg_page_list(fr_desc-data_frpl); + ret = isert_create_fr_desc(device-ib_device, + isert_conn-conn_pd, fr_desc); + if (ret) { + pr_err(Failed to create fastreg descriptor err=%d\n, + ret); goto err; } - pr_debug(Create fr_desc %p page_list %p\n, -fr_desc, fr_desc-data_frpl-page_list); - fr_desc-valid = true; list_add_tail(fr_desc-list, isert_conn-conn_fr_pool); isert_conn-conn_fr_pool_size++; } -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/11] IB/isert: Support T10-PI protected transactions
In case the Target core passed transport T10 protection operation: 1. Register data buffer (data memory region) 2. Register protection buffer if exsists (prot memory region) 3. Register signature region (signature memory region) - use work request IB_WR_REG_SIG_MR 4. Execute RDMA 5. Upon RDMA completion check the signature status - if succeeded send good SCSI response - if failed send SCSI bad response with appropriate sense buffer Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/ulp/isert/ib_isert.c | 376 ++- 1 files changed, 321 insertions(+), 55 deletions(-) diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c index 9aa933e..8a888f0 100644 --- a/drivers/infiniband/ulp/isert/ib_isert.c +++ b/drivers/infiniband/ulp/isert/ib_isert.c @@ -1499,6 +1499,7 @@ isert_unreg_rdma(struct isert_cmd *isert_cmd, struct isert_conn *isert_conn) if (wr-fr_desc) { pr_debug(unreg_fastreg_cmd: %p free fr_desc %p\n, isert_cmd, wr-fr_desc); + wr-fr_desc-protected = false; spin_lock_bh(isert_conn-conn_lock); list_add_tail(wr-fr_desc-list, isert_conn-conn_fr_pool); spin_unlock_bh(isert_conn-conn_lock); @@ -1604,13 +1605,65 @@ isert_completion_put(struct iser_tx_desc *tx_desc, struct isert_cmd *isert_cmd, } static void +isert_pi_err_sense_buffer(u8 *buf, u8 key, u8 asc, u8 ascq) +{ + buf[0] = 0x70; + buf[SPC_SENSE_KEY_OFFSET] = key; + buf[SPC_ASC_KEY_OFFSET] = asc; + buf[SPC_ASCQ_KEY_OFFSET] = ascq; +} + +static void isert_completion_rdma_write(struct iser_tx_desc *tx_desc, struct isert_cmd *isert_cmd) { + struct isert_rdma_wr *wr = isert_cmd-rdma_wr; struct iscsi_cmd *cmd = isert_cmd-iscsi_cmd; + struct se_cmd *se_cmd = cmd-se_cmd; struct isert_conn *isert_conn = isert_cmd-conn; struct isert_device *device = isert_conn-conn_device; + struct ib_mr_status mr_status; + int ret; + if (wr-fr_desc wr-fr_desc-protected) { + ret = ib_check_mr_status(wr-fr_desc-pi_ctx-sig_mr, +IB_MR_CHECK_SIG_STATUS, mr_status); + if (ret) { + pr_err(ib_check_mr_status failed, ret %d\n, ret); + goto fail_mr_status; + } + if (mr_status.fail_status IB_MR_CHECK_SIG_STATUS) { + u32 block_size = se_cmd-se_dev-dev_attrib.block_size; + + pr_err(PI error found type %d at offset %llx + expected %x vs actual %x\n, + mr_status.sig_err.err_type, + mr_status.sig_err.sig_err_offset, + mr_status.sig_err.expected, + mr_status.sig_err.actual); + switch (mr_status.sig_err.err_type) { + case IB_SIG_BAD_GUARD: + se_cmd-pi_err = TARGET_GUARD_CHECK_FAILED; + break; + case IB_SIG_BAD_REFTAG: + se_cmd-pi_err = TARGET_REFTAG_CHECK_FAILED; + break; + case IB_SIG_BAD_APPTAG: + se_cmd-pi_err = TARGET_APPTAG_CHECK_FAILED; + break; + } + se_cmd-block_num = + mr_status.sig_err.sig_err_offset / block_size; + isert_pi_err_sense_buffer(se_cmd-sense_buffer, + ILLEGAL_REQUEST, 0x10, + (u8)se_cmd-pi_err); + se_cmd-scsi_status = SAM_STAT_CHECK_CONDITION; + se_cmd-scsi_sense_length = TRANSPORT_SENSE_BUFFER; + se_cmd-se_cmd_flags |= SCF_EMULATED_TASK_SENSE; + } + } + +fail_mr_status: device-unreg_rdma_mem(isert_cmd, isert_conn); isert_put_response(isert_conn-conn, cmd); } @@ -1624,7 +1677,43 @@ isert_completion_rdma_read(struct iser_tx_desc *tx_desc, struct se_cmd *se_cmd = cmd-se_cmd; struct isert_conn *isert_conn = isert_cmd-conn; struct isert_device *device = isert_conn-conn_device; + struct ib_mr_status mr_status; + int ret; + if (wr-fr_desc wr-fr_desc-protected) { + ret = ib_check_mr_status(wr-fr_desc-pi_ctx-sig_mr, +IB_MR_CHECK_SIG_STATUS, mr_status); + if (ret) { + pr_err(ib_check_mr_status failed, ret %d\n, ret); + goto fail_mr_status; + } + if (mr_status.fail_status IB_MR_CHECK_SIG_STATUS
Re: [PATCH 09/11] IB/isert: Accept RDMA_WRITE completions
On 1/11/2014 11:14 PM, Or Gerlitz wrote: On Thu, Jan 9, 2014 at 6:40 PM, Sagi Grimberg sa...@mellanox.com wrote: In case of protected transactions, we will need to check the protection status of the transaction before sending SCSI response. So be ready for RDMA_WRITE completions. currently we don't ask for these completions, but for T10-PI we will. @@ -1721,9 +1735,9 @@ __isert_send_completion(struct iser_tx_desc *tx_desc, isert_conn, ib_dev); break; case ISER_IB_RDMA_WRITE: - pr_err(isert_send_completion: Got ISER_IB_RDMA_WRITE\n); - dump_stack(); - break; + pr_debug(isert_send_completion: Got ISER_IB_RDMA_WRITE\n); + atomic_dec(isert_conn-post_send_buf_count); + isert_completion_rdma_write(tx_desc, isert_cmd); are we doing fall through here? why? O, somehow i missed it in squash... Will fix, Thanks! case ISER_IB_RDMA_READ: pr_debug(isert_send_completion: Got ISER_IB_RDMA_READ:\n); -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/11] IB/isert: Initialize T10-PI resources
On 1/11/2014 11:09 PM, Or Gerlitz wrote: On Thu, Jan 9, 2014 at 6:40 PM, Sagi Grimberg sa...@mellanox.com wrote: @@ -557,8 +629,14 @@ isert_connect_request(struct rdma_cm_id *cma_id, struct rdma_cm_event *event) goto out_mr; } + if (pi_support !device-pi_capable) { + pr_err(Protection information requested but not supported\n); + ret = -EINVAL; + goto out_mr; + } + if (device-use_fastreg) { - ret = isert_conn_create_fastreg_pool(isert_conn); + ret = isert_conn_create_fastreg_pool(isert_conn, pi_support); just a nit, the pi_support bit can be looked up from the isert_conn struct, isn't it? if (ret) { pr_err(Conn: %p failed to create fastreg pool\n, isert_conn); @@ -566,7 +644,7 @@ isert_connect_request(struct rdma_cm_id *cma_id, struct rdma_cm_event *event) } } - ret = isert_conn_setup_qp(isert_conn, cma_id); + ret = isert_conn_setup_qp(isert_conn, cma_id, pi_support); if (ret) goto out_conn_dev; @@ -2193,7 +2271,7 @@ isert_fast_reg_mr(struct fast_reg_descriptor *fr_desc, pagelist_len = isert_map_fr_pagelist(ib_dev, sg_start, sg_nents, fr_desc-data_frpl-page_list[0]); - if (!fr_desc-valid) { + if (!fr_desc-data_key_valid) { memset(inv_wr, 0, sizeof(inv_wr)); inv_wr.opcode = IB_WR_LOCAL_INV; inv_wr.ex.invalidate_rkey = fr_desc-data_mr-rkey; @@ -2225,7 +2303,7 @@ isert_fast_reg_mr(struct fast_reg_descriptor *fr_desc, pr_err(fast registration failed, ret:%d\n, ret); return ret; } - fr_desc-valid = false; + fr_desc-data_key_valid = false; ib_sge-lkey = fr_desc-data_mr-lkey; ib_sge-addr = fr_desc-data_frpl-page_list[0] + page_off; diff --git a/drivers/infiniband/ulp/isert/ib_isert.h b/drivers/infiniband/ulp/isert/ib_isert.h index 708a069..fab8b50 100644 --- a/drivers/infiniband/ulp/isert/ib_isert.h +++ b/drivers/infiniband/ulp/isert/ib_isert.h @@ -48,11 +48,21 @@ struct iser_tx_desc { struct ib_send_wr send_wr; } __packed; +struct pi_context { + struct ib_mr *prot_mr; + boolprot_key_valid; + struct ib_fast_reg_page_list *prot_frpl; + struct ib_mr *sig_mr; + boolsig_key_valid; +}; + struct fast_reg_descriptor { - struct list_headlist; - struct ib_mr*data_mr; - struct ib_fast_reg_page_list*data_frpl; - boolvalid; + struct list_headlist; + struct ib_mr *data_mr; + booldata_key_valid; + struct ib_fast_reg_page_list *data_frpl; + boolprotected; no need for many bools in one structure... each one needs a bit, correct? so embed them in one variable I figured it will be more explicit this way. protected boolean indicates if we should check the data-integrity status, and the other 3 indicates if the relevant MR is valid (no need to execute local invalidation). Do you think I should compact it somehow? usually xxx_valid booleans will align together although not always. + struct pi_context *pi_ctx; }; struct isert_rdma_wr { @@ -140,6 +150,7 @@ struct isert_cq_desc { struct isert_device { int use_fastreg; + boolpi_capable; this one (and its such) is/are derived from the ib device capabilities, so I would suggest to keep a copy of the caps instead of derived bools Yes, I'll keep the device capabilities instead. int cqs_used; int refcount; int cq_active_qps[ISERT_MAX_CQ]; -- To unsubscribe from this list: send the line unsubscribe target-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] IB/mlx5: Fix smatch warnings
Possible double free on in-mailbox. Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/hw/mlx5/mr.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c index bc27f6b..f023711 100644 --- a/drivers/infiniband/hw/mlx5/mr.c +++ b/drivers/infiniband/hw/mlx5/mr.c @@ -1050,13 +1050,13 @@ struct ib_mr *mlx5_ib_create_mr(struct ib_pd *pd, in-seg.flags = MLX5_PERM_UMR_EN | access_mode; err = mlx5_core_create_mkey(dev-mdev, mr-mmr, in, sizeof(*in), NULL, NULL, NULL); - kfree(in); if (err) goto err_destroy_psv; mr-ibmr.lkey = mr-mmr.key; mr-ibmr.rkey = mr-mmr.key; mr-umem = NULL; + kfree(in); return mr-ibmr; -- 1.7.8.2 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] IB/mlx5: Fix siganture rule constants according to FW specifications
Use DIF CRC INC with apptag escape (0x8) and update IP-CSUM entries. Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/infiniband/hw/mlx5/qp.c |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index 7981620..58c4735 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -1952,9 +1952,9 @@ static int format_selector(struct ib_sig_attrs *attr, { #define FORMAT_DIF_NONE0 -#define FORMAT_DIF_CRC_INC 4 -#define FORMAT_DIF_CSUM_INC12 -#define FORMAT_DIF_CRC_NO_INC 13 +#define FORMAT_DIF_CRC_INC 8 +#define FORMAT_DIF_CRC_NO_INC 12 +#define FORMAT_DIF_CSUM_INC13 #define FORMAT_DIF_CSUM_NO_INC 14 switch (domain-sig.dif.type) { -- 1.7.8.2 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html