Re: [LSF/MM TOPIC] Reducing the SRP initiator failover time

2013-02-08 Thread Sagi Grimberg

On 2/8/2013 12:42 AM, Vu Pham wrote:





It is known that it takes about two to three minutes before the 
upstream SRP initiator fails over from a failed path to a working 
path. This is not only considered longer than acceptable but is also 
longer than other Linux SCSI initiators (e.g. iSCSI and FC). Progress 
so far with improving the fail-over SRP initiator has been slow. This 
is because the discussion about candidate patches occurred at two 
different levels: not only the patches itself were discussed but also 
the approach that should be followed. That last aspect is easier to 
discuss in a meeting than over a mailing list. Hence the proposal to 
discuss SRP initiator failover behavior during the LSF/MM summit. The 
topics that need further discussion are:

* If a path fails, remove the entire SCSI host or preserve the SCSI
  host and only remove the SCSI devices associated with that host ?
* Which software component should test the state of a path and should
  reconnect to an SRP target if a path is restored ? Should that be
  done by the user space process srp_daemon or by the SRP initiator
  kernel module ?
* How should the SRP initiator behave after a path failure has been
  detected ? Should the behavior be similar to the FC initiator with
  its fast_io_fail_tmo and dev_loss_tmo parameters ?

Dave, if this topic gets accepted, I really hope you will be able to 
attend the LSF/MM summit.


Bart.


Hello Bart,

Thank you for taking the initiative.
Mellanox think that this should be discussed. We'd be happy to attend.

We also would like to discuss:
* How and how fast does SRP detect a path failure besides RC error?
* Role of srp_daemon, how often srp_daemon scan fabric for new/old 
targets, how-to scale srp_daemon discovery, traps.


-vu

Hey Bart,

I agree with Vu that this issue should be discussed. We'd be happy to 
attend.


--
Sagi
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/srp: Fail I/O requests if the transport is offline

2013-02-18 Thread Sagi Grimberg

On 2/18/2013 6:06 AM, David Dillow wrote:

On Fri, 2013-02-15 at 10:39 +0100, Bart Van Assche wrote:

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index 8a7eb9f..b34752d 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -734,6 +734,7 @@ static int srp_reconnect_target(struct srp_target_port 
*target)
  
  	scsi_target_unblock(shost-shost_gendev, ret == 0 ? SDEV_RUNNING :

SDEV_TRANSPORT_OFFLINE);
+   target-transport_offline = ret != 0;

Minor nit, that line is hard to read; I keep thinking it needs parens
around the conditional...

Perhaps
target-transport_offline = !!ret;
or
target-transport_offline = ret;

gcc should do the right conversion since we're assigning to a bool.


Or, Vu, does this solve the issue you've seen? I may have time to test
later this week, but not before.



Hey David,

This indeed solve scsi_host removal issues.
Vu is on vacation, I'll perform some more failover tests...

-Sagi
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/srp: Fail I/O requests if the transport is offline

2013-02-24 Thread Sagi Grimberg

On 2/24/2013 10:09 AM, Bart Van Assche wrote:

On 02/18/13 09:11, Sagi Grimberg wrote:

On 2/18/2013 6:06 AM, David Dillow wrote:

On Fri, 2013-02-15 at 10:39 +0100, Bart Van Assche wrote:

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c
b/drivers/infiniband/ulp/srp/ib_srp.c
index 8a7eb9f..b34752d 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -734,6 +734,7 @@ static int srp_reconnect_target(struct
srp_target_port *target)
  scsi_target_unblock(shost-shost_gendev, ret == 0 ?
SDEV_RUNNING :
  SDEV_TRANSPORT_OFFLINE);
+target-transport_offline = ret != 0;

Minor nit, that line is hard to read; I keep thinking it needs parens
around the conditional...

Perhaps
target-transport_offline = !!ret;
or
target-transport_offline = ret;

gcc should do the right conversion since we're assigning to a bool.


Or, Vu, does this solve the issue you've seen? I may have time to test
later this week, but not before.


This indeed solve scsi_host removal issues.
Vu is on vacation, I'll perform some more failover tests...


Hello Sagi,

Since no further feedback was posted on the list I assume that means 
that all tests passed ?


Bart.



Hey Bart,
Sorry for the delay I was just about to reply...

From my end, the related patchset seems solve the scsi_host removal 
issue and prevents the SCSI error handling loop.
Generally our tests passed, I still have some issue with long-term 
failover test but I'm not sure its SRP (perhaps might origin in IB layer).

So ack from me...

-Sagi
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 11/13] IB/srp: Make HCA completion vector configurable

2013-07-15 Thread Sagi Grimberg

On 7/15/2013 2:06 PM, Bart Van Assche wrote:

On 14/07/2013 3:43, Sagi Grimberg wrote:

On 7/3/2013 3:58 PM, Bart Van Assche wrote:

Several InfiniBand HCA's allow to configure the completion vector
per queue pair. This allows to spread the workload created by IB
completion interrupts over multiple MSI-X vectors and hence over
multiple CPU cores. In other words, configuring the completion
vector properly not only allows to reduce latency on an initiator
connected to multiple SRP targets but also allows to improve
throughput.


Hey Bart,
Just wrote a small patch to allow srp_daemon spread connection across
HCA's completion vectors.
But re-thinking on this, is it really a good idea to give the user
control over completion
vectors for CQs he doesn't really owns. This way the user must retrieve
the maximum completion
vectors from the ib_device and consider this when adding a connection
and In addition will need to set proper IRQ affinity.

Perhaps the driver can manage this on it's own without involving the
user, take the mlx4_en driver for
example, it spreads it's CQs across HCAs completion vectors without
involving the user. the user that
opens a socket has no influence of the underlying cq-comp-vector
assignment.

The only use-case I can think of is where the user will want to use only
a subset of the completion-vectors
if the user will want to reserve some completion-vectors for native IB
applications but I don't know
how common it is.

Other from that, I think it is always better to spread the CQs across
HCA completion-vectors, so perhaps the driver
just assign connection CQs across comp-vecs without getting args from
the user, but simply iterate over comp_vectors.

What do you think?


Hello Sagi,

Sorry but I do not think it is a good idea to let srp_daemon assign 
the completion vector. While this might work well on single-socket 
systems this will result in suboptimal results on NUMA systems. For 
certain workloads on NUMA systems, and when a NUMA initiator system is 
connected to multiple target systems, the optimal configuration is to 
make sure that all processing that is associated with a single SCSI 
host occurs on the same NUMA node. This means configuring the 
completion vector value such that IB interrupts are generated on the 
same NUMA node where the associated SCSI host and applications are 
running.


More in general, performance tuning on NUMA systems requires 
system-wide knowledge of all applications that are running and also of 
which interrupt is processed by which NUMA node. So choosing a proper 
value for the completion vector is only possible once the system 
topology and the IRQ affinity masks are known. I don't think we should 
build knowledge of all this in srp_daemon.


Bart.



Hey Bart,

Thanks for your quick attention for my question.
srp_daemon is a package designated for the costumer to automatically 
detect targets in the IB fabric. From our expeirience here in Mellanox, 
costumers/users like automatic plugplay tools.
They are reluctant to build their own scriptology to enhance performance 
and settle with srp_daemon which is preferred over use of ibsrpdm and 
manual adding new targets.
Regardless, the completion vectors assignment is meaningless without 
setting proper IRQ affinity, so in the worst case where the user didn't 
set his IRQ affinity,
this assignment will perform like the default completion vector 
assignment as all IRQs are directed without any masking i.e. core 0.


From my expiriments in NUMA systems, optimal performance is gained 
where all IRQs are directed to half of the cores on the NUMA node close 
to the HCA, and all traffic generators share the other half of the cores 
on the same NUMA node. So based on that knowledge, I thought that 
srp_daemon/srp driver will assign it's CQs across the HCAs completion 
vectors, and the user is encouraged to set the IRQ affinity as described 
above to gain optimal performance.
Adding connections over the far NUMA node don't seem to benefit 
performance too much...


As I mentioned, a use-case I see that may raise a problem here, is if 
the user would like to maintain multiple SRP connections and reserve 
some completion vectors for other IB applications on the system.
in this case the user will be able to disable srp_daemon/srp driver 
completion vectors assignment.


So, this was just an idea, and easy implementation that would 
potentionaly give the user semi-automatic performance optimized 
configuration...


-Sagi
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 11/13] IB/srp: Make HCA completion vector configurable

2013-07-16 Thread Sagi Grimberg

On 7/15/2013 9:23 PM, Bart Van Assche wrote:

On 15/07/2013 7:29, Sagi Grimberg wrote:

srp_daemon is a package designated for the customer to automatically
detect targets in the IB fabric. From our experience here in Mellanox,
customers/users like automatic plugplay tools.
They are reluctant to build their own scriptology to enhance performance
and settle with srp_daemon which is preferred over use of ibsrpdm and
manual adding new targets.
Regardless, the completion vectors assignment is meaningless without
setting proper IRQ affinity, so in the worst case where the user didn't
set his IRQ affinity,
this assignment will perform like the default completion vector
assignment as all IRQs are directed without any masking i.e. core 0.

From my experiments in NUMA systems, optimal performance is gained
where all IRQs are directed to half of the cores on the NUMA node close
to the HCA, and all traffic generators share the other half of the cores
on the same NUMA node. So based on that knowledge, I thought that
srp_daemon/srp driver will assign it's CQs across the HCAs completion
vectors, and the user is encouraged to set the IRQ affinity as described
above to gain optimal performance.
Adding connections over the far NUMA node don't seem to benefit
performance too much...

As I mentioned, a use-case I see that may raise a problem here, is if
the user would like to maintain multiple SRP connections and reserve
some completion vectors for other IB applications on the system.
in this case the user will be able to disable srp_daemon/srp driver
completion vectors assignment.

So, this was just an idea, and easy implementation that would
potentially give the user semi-automatic performance optimized
configuration...


Hello Sagi,

I agree with you that it would help a lot if completion vector 
assignment could be automated such that end users do not have to care 
about assigning completion vector numbers. The challenge is to find an 
approach that is general enough such that it works for all possible 
use cases. One possible approach is to let a tool that has knowledge 
about the application fill in completion vector numbers in 
srp_daemon.conf and let srp_daemon use the values generated by this 
tool. That approach would avoid that srp_daemon has to have any 
knowledge about the application but would still allow srp_daemon to 
assign the completion vector numbers.


Bart.


Hey Bart,
This sounds like a nice Idea, but there an inherent problem about 
applications coming and going while the connections are static (somewhat),
how can you control pinning an arbitrary application running (over SRP 
devices of-course) at certain point of time.


So will you agree at least to give target-comp_vector a default of 
IB_CQ_VECTOR_LEAST_ATTACHED?
From my point of view, a user that don't have a slightest clue about 
completion vectors and performance optimization, this is somewhat better 
than doing nothing...


-Sagi
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 11/13] IB/srp: Make HCA completion vector configurable

2013-07-16 Thread Sagi Grimberg

On 7/16/2013 1:58 PM, Bart Van Assche wrote:

On 16/07/2013 4:11, Sagi Grimberg wrote:

This sounds like a nice Idea, but there an inherent problem about
applications coming and going while the connections are static 
(somewhat),

how can you control pinning an arbitrary application running (over SRP
devices of-course) at certain point of time.

So will you agree at least to give target-comp_vector a default of
IB_CQ_VECTOR_LEAST_ATTACHED?
 From my point of view, a user that don't have a slightest clue about
completion vectors and performance optimization, this is somewhat better
than doing nothing...


Hello Sagi,

That sounds like an interesting proposal to me. But did the patch that 
adds the IB_CQ_VECTOR_LEAST_ATTACHED feature ever get accepted in the 
upstream Linux kernel ? I have tried to find that symbol in Linux 
kernel v3.11-rc1 but couldn't find it. Maybe I have overlooked 
something ?


Bart.



Oh you're right!

I'll ask Vu, from git blame on old OFED I see that He wrote the code...
Perhaps this should be added as well.

-Sagi
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 11/13] IB/srp: Make HCA completion vector configurable

2013-07-17 Thread Sagi Grimberg

On 7/16/2013 6:11 PM, Bart Van Assche wrote:

On 14/07/2013 3:43, Sagi Grimberg wrote:

Just wrote a small patch to allow srp_daemon spread connection across
HCA's completion vectors.


Hello Sagi,

How about the following approach:
- Add support for reading the completion vector from srp_daemon.conf,
  similar to how several other parameters are already read from that
  file.


Here We need to take into consideration that we are changing the 
functionality of srp_daemon.conf.
Now instead of simply allowing/dis-allowing targets of specific 
attributes, we are also defining configuration attributes of allowed 
targets.
This might be uncomfortable for the user to explicitly write N target 
strings in srp_daemon.conf just for completion vectors assignment.


Perhaps srp_daemon.conf can contain a list (comma separated) of reserved 
completion vectors for srp_daemon to spread CQs among them.
If this line won't exist - srp_daemon will spread assignment on all HCAs 
completion vectors.



- If the completion vector parameter has not been set in
  srp_daemon.conf, let srp_daemon assign a completion vector such that
  IB interrupts for different SRP hosts use different completion
  vectors.

Bart.


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-3.11 7/7] IB/iser: Introduce fast memory registration model (FRWR)

2013-07-22 Thread Sagi Grimberg

On 7/22/2013 2:46 PM, Bart Van Assche wrote:

On 07/18/13 15:25, Or Gerlitz wrote:

+static int iser_fast_reg_mr(struct fast_reg_descriptor *desc,
+struct iser_conn *ib_conn,
+struct iser_regd_buf *regd_buf,
+u32 offset, unsigned int data_size,
+unsigned int page_list_len)
+{
+struct ib_send_wr fastreg_wr, inv_wr;
+struct ib_send_wr *bad_wr, *wr = NULL;
+u8 key;
+int ret;
+
+if (!desc-valid) {
+memset(inv_wr, 0, sizeof(inv_wr));
+inv_wr.opcode = IB_WR_LOCAL_INV;
+inv_wr.send_flags = IB_SEND_SIGNALED;
+inv_wr.ex.invalidate_rkey = desc-data_mr-rkey;
+wr = inv_wr;
+/* Bump the key */
+key = (u8)(desc-data_mr-rkey  0x00FF);
+ib_update_fast_reg_key(desc-data_mr, ++key);
+}
+
+/* Prepare FASTREG WR */
+memset(fastreg_wr, 0, sizeof(fastreg_wr));
+fastreg_wr.opcode = IB_WR_FAST_REG_MR;
+fastreg_wr.send_flags = IB_SEND_SIGNALED;
+fastreg_wr.wr.fast_reg.iova_start = 
desc-data_frpl-page_list[0] + offset;

+fastreg_wr.wr.fast_reg.page_list = desc-data_frpl;
+fastreg_wr.wr.fast_reg.page_list_len = page_list_len;
+fastreg_wr.wr.fast_reg.page_shift = SHIFT_4K;
+fastreg_wr.wr.fast_reg.length = data_size;
+fastreg_wr.wr.fast_reg.rkey = desc-data_mr-rkey;
+fastreg_wr.wr.fast_reg.access_flags = (IB_ACCESS_LOCAL_WRITE  |
+   IB_ACCESS_REMOTE_WRITE |
+   IB_ACCESS_REMOTE_READ);


Hello Sagi,

If I interpret the above code correctly the rkey used in the previous 
FRWR is invalidated as soon as a new FRWR is queued. Does this mean 
that the iSER initiator limits queue depth to one ?


Another question: is it on purpose that iscsi_iser_cleanup_task() does 
not invalidate an rkey if a command has been aborted successfully ? A 
conforming iSER target does not send a response for aborted commands. 
Will successful command abortion result in the rkey not being 
invalidated ? What will happen if a new FRWR is submitted with an rkey 
that is still valid ?


Thanks,

Bart.



Hey Bart,

You interpret correctly, iSER will local invalidate the rkey just before 
re-using it (conditioned that it is not previously invalidated - 
remotely by the target).
This code is still missing the remote invalidate part, then iSER 
initiator will publish its remote invalidate support and in case the 
target may remote invalidate (seen in the RSP completion) the rkey
and the Initiator will pick it up in the RSP completion and mark the 
associated MR as valid (valid for use again).


Not sure what you meant in your question, but this does not mean that 
iSER initiator limits the queue depth to 1,
initiator manages a pool of fastreg descriptors of size == max queued 
commands (per connection) each containing an ib_mr,
For each concurrent IOP it takes a fastreg descriptor from the pool, and 
uses it for registration (if marked as not valid - will local invalidate 
the rkey and then use it for registration).
When cleanup_task - iser_task_rdma_finalize - iser_unreg_rdma_mem is 
called,
it just returns the fastreg to the pool (without local invalidate - as 
it is done when it will be reused).


The reason I chose to do that is that if I locally invalidate the rkey 
upon task cleanup then
Only after the completion I'm allowed to return it back to the pool 
(only then I know it's ready for reuse) and assuming that I still want 
to evacuate the task and not wait in my fast-path,
I may end-up in certain conditions in a situation that I have no 
resources to handle the next IOP since all MRs are waiting for LOCAL_INV 
completions.
A possible solution here was to heuristically use a larger pool - but I 
wanted to avoid that...


So just to clarify the flow:
. at connection establishment allocate pool of fastreg descriptors
. upon each IOP take a fastreg descriptor from the pool
. if it is not invalidated - invalidate it.
. register using FRWR.
. when cleanup_task is called - just return the fastreg descriptor to 
the pool.

. at connection teardown free all resources.
Still to come:
. upon each IOP response, check if the target used remote invalidate - 
if so mark relevant fastreg as valid.


Hope this helps.

-Sagi
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-3.11 7/7] IB/iser: Introduce fast memory registration model (FRWR)

2013-07-23 Thread Sagi Grimberg

On 7/23/2013 2:58 PM, Bart Van Assche wrote:

On 07/22/13 15:11, Sagi Grimberg wrote:

So just to clarify the flow:
. at connection establishment allocate pool of fastreg descriptors
. upon each IOP take a fastreg descriptor from the pool
 . if it is not invalidated - invalidate it.
 . register using FRWR.
. when cleanup_task is called - just return the fastreg descriptor to
the pool.
. at connection teardown free all resources.
Still to come:
. upon each IOP response, check if the target used remote invalidate -
if so mark relevant fastreg as valid.


Hello Sagi and Or,

Thanks for the clarifications. I have one more question though. My 
interpretation of section 10.6 Memory Management in the IB 
specification is that memory registration maps a memory region that 
either has contiguous virtual addresses or contiguous physical 
addresses. However, there is no such requirement for an sg-list. As an 
example, for direct I/O to a block device with a sector size of 512 
bytes it is only required that I/O occurs in multiples of 512 bytes 
and from memory aligned on 512-byte boundaries. So the use of direct 
I/O can result in an sg-list where the second and subsequent sg-list 
elements have a non-zero offset. Do you agree with this ? Are such 
sg-lists mapped correctly by the FRWR code ?


Bart.



Hey Bart,

You are on the money with this observation, like FMRs, FRWR cannot 
register any arbitrary SG-list. You have the same limitations.
Unlike SRP where the initiator will use multiple FMRs to register such 
unaligned SG-lists,
iSER uses a bounce buffer to copy the data to a nice physically 
contiguous memory area (see patch 5/7 fall_to_bounce_buf routine), thus 
will pass a single R_Key for each transaction.
An equivalent FRWR implementation for SRP will also use multiple FRWRs 
in-order to register such un-aligned SG-lists and publish the R_Keys 
in ib_sge.


Hope this helps,

-Sagi
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-3.11 7/7] IB/iser: Introduce fast memory registration model (FRWR)

2013-07-28 Thread Sagi Grimberg

On 7/28/2013 11:15 AM, Or Gerlitz wrote:

On 26/07/2013 20:15, Vu Pham wrote:

Hello Or/Sagi,

Just a minor

 /**
+ * iser_create_frwr_pool - Creates pool of fast_reg descriptors
+ * for fast registration work requests.
+ * returns 0 on success, or errno code on failure
+ */
+int iser_create_frwr_pool(struct iser_conn *ib_conn, unsigned 
cmds_max)

+{
+struct iser_device*device = ib_conn-device;
+struct fast_reg_descriptor*desc;
+int i, ret;
+
+ INIT_LIST_HEAD(ib_conn-fastreg.frwr.pool);
+ib_conn-fastreg.frwr.pool_size = 0;
+for (i = 0; i  cmds_max; i++) {
+desc = kmalloc(sizeof(*desc), GFP_KERNEL);
+if (!desc) {
+iser_err(Failed to allocate a new fast_reg 
descriptor\n);

+ret = -ENOMEM;
+goto err;
+}
+
+desc-data_frpl = 
ib_alloc_fast_reg_page_list(device-ib_device,

+ ISCSI_ISER_SG_TABLESIZE + 1);
+if (IS_ERR(desc-data_frpl)) {

ret = PTR_ERR(desc-data_frpl);
+iser_err(Failed to allocate ib_fast_reg_page_list 
err=%ld\n,

+ PTR_ERR(desc-data_frpl));

using ret

+goto err;
+}
+
+desc-data_mr = ib_alloc_fast_reg_mr(device-pd,
+ ISCSI_ISER_SG_TABLESIZE + 1);
+if (IS_ERR(desc-data_mr)) {

ret = PTR_ERR(desc-data_mr);

+iser_err(Failed to allocate ib_fast_reg_mr err=%ld\n,
+ PTR_ERR(desc-data_mr));

using ret

+ ib_free_fast_reg_page_list(desc-data_frpl);
+goto err;
+}
+desc-valid = true;
+list_add_tail(desc-list, ib_conn-fastreg.frwr.pool);
+ib_conn-fastreg.frwr.pool_size++;
+}
+
+return 0;
+err:
+iser_free_frwr_pool(ib_conn);
+return ret;
+}





Nice catch!

I see that Roland hasn't yet picked this series so I will re-submit it 
with fixes to the issues you have found here.


Or.



Nice catch indeed, thanks Vu.

-Sagi
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IB/iser: Generalize rdma memory registration

2013-08-15 Thread Sagi Grimberg

On 8/14/2013 10:52 PM, Dan Carpenter wrote:

Hello Sagi Grimberg,

This is a semi-automatic email about new static checker warnings.

The patch b4e155ffbbd6: IB/iser: Generalize rdma memory
registration from Jul 28, 2013, leads to the following Smatch
complaint:

drivers/infiniband/ulp/iser/iser_initiator.c:318 iser_free_rx_descriptors()
 error: we previously assumed 'device' could be null (see line 313)

drivers/infiniband/ulp/iser/iser_initiator.c
312 
313 if (device  device-iser_free_rdma_reg_res)
 ^^
New check.

314 device-iser_free_rdma_reg_res(ib_conn);
315 
316 rx_desc = ib_conn-rx_descs;
317 for (i = 0; i  ib_conn-qp_max_recv_dtos; i++, rx_desc++)
318 ib_dma_unmap_single(device-ib_device, 
rx_desc-dma_addr,
 ^
Old dererference.

319 ISER_RX_PAYLOAD_SIZE, 
DMA_FROM_DEVICE);
320 kfree(ib_conn-rx_descs);

Has the code changed so that we need to check now?

regards,
dan carpenter


Hey Dan,

Thanks for the input!
The case here is that for some weird error flows we can end-up in this 
function with device == NULL, but if you pass the first condition if 
(!ib_conn-rx_descs) you are safe...


I'll fire up a fix for that asap.

Cheers,

-Sagi
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] IB/iser: Fix redundant pointer check in dealloc flow

2013-08-15 Thread Sagi Grimberg
This bug was discovered by Smatch static checker ran by
Dan Carpenter. If in free_rx_descriptors rx_descs are not NULL
the iser device is definately not NULL as it was created before,
so no need to check it before dereferencing it.

Signed-off-by: Sagi Grimberg sa...@mellanox.com
Signed-off-by: Or Gerlitz ogerl...@mellanox.com
---
 drivers/infiniband/ulp/iser/iser_initiator.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c 
b/drivers/infiniband/ulp/iser/iser_initiator.c
index bdc38f4..5f01da9 100644
--- a/drivers/infiniband/ulp/iser/iser_initiator.c
+++ b/drivers/infiniband/ulp/iser/iser_initiator.c
@@ -310,7 +310,7 @@ void iser_free_rx_descriptors(struct iser_conn *ib_conn)
if (!ib_conn-rx_descs)
goto free_login_buf;
 
-   if (device  device-iser_free_rdma_reg_res)
+   if (device-iser_free_rdma_reg_res)
device-iser_free_rdma_reg_res(ib_conn);
 
rx_desc = ib_conn-rx_descs;
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 8/8] IB/srp: Make queue size configurable

2013-08-20 Thread Sagi Grimberg

On 8/20/2013 3:50 PM, Bart Van Assche wrote:

Certain storage configurations, e.g. a sufficiently large array of
hard disks in a RAID configuration, need a queue depth above 64 to
achieve optimal performance. Hence make the queue depth configurable.

Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Roland Dreier rol...@purestorage.com
Cc: David Dillow dillo...@ornl.gov
Cc: Vu Pham v...@mellanox.com
Cc: Sebastian Riemer sebastian.rie...@profitbricks.com
Cc: Konrad Grzybowski konr...@k2.pl
---
  drivers/infiniband/ulp/srp/ib_srp.c |  125 ++-
  drivers/infiniband/ulp/srp/ib_srp.h |   17 +++--
  2 files changed, 103 insertions(+), 39 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index ece1f2d..6de2323 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -299,16 +299,16 @@ static int srp_create_target_ib(struct srp_target_port 
*target)
return -ENOMEM;
  
  	recv_cq = ib_create_cq(target-srp_host-srp_dev-dev,

-  srp_recv_completion, NULL, target, SRP_RQ_SIZE,
-  target-comp_vector);
+  srp_recv_completion, NULL, target,
+  target-queue_size, target-comp_vector);
if (IS_ERR(recv_cq)) {
ret = PTR_ERR(recv_cq);
goto err;
}
  
  	send_cq = ib_create_cq(target-srp_host-srp_dev-dev,

-  srp_send_completion, NULL, target, SRP_SQ_SIZE,
-  target-comp_vector);
+  srp_send_completion, NULL, target,
+  target-queue_size, target-comp_vector);
if (IS_ERR(send_cq)) {
ret = PTR_ERR(send_cq);
goto err_recv_cq;
@@ -317,8 +317,8 @@ static int srp_create_target_ib(struct srp_target_port 
*target)
ib_req_notify_cq(recv_cq, IB_CQ_NEXT_COMP);
  
  	init_attr-event_handler   = srp_qp_event;

-   init_attr-cap.max_send_wr = SRP_SQ_SIZE;
-   init_attr-cap.max_recv_wr = SRP_RQ_SIZE;
+   init_attr-cap.max_send_wr = target-queue_size;
+   init_attr-cap.max_recv_wr = target-queue_size;
init_attr-cap.max_recv_sge= 1;
init_attr-cap.max_send_sge= 1;
init_attr-sq_sig_type = IB_SIGNAL_ALL_WR;
@@ -364,6 +364,10 @@ err:
return ret;
  }
  
+/*

+ * Note: this function may be called without srp_alloc_iu_bufs() having been
+ * invoked. Hence the target-[rt]x_ring checks.
+ */
  static void srp_free_target_ib(struct srp_target_port *target)
  {
int i;
@@ -375,10 +379,18 @@ static void srp_free_target_ib(struct srp_target_port 
*target)
target-qp = NULL;
target-send_cq = target-recv_cq = NULL;
  
-	for (i = 0; i  SRP_RQ_SIZE; ++i)

-   srp_free_iu(target-srp_host, target-rx_ring[i]);
-   for (i = 0; i  SRP_SQ_SIZE; ++i)
-   srp_free_iu(target-srp_host, target-tx_ring[i]);
+   if (target-rx_ring) {
+   for (i = 0; i  target-queue_size; ++i)
+   srp_free_iu(target-srp_host, target-rx_ring[i]);
+   kfree(target-rx_ring);
+   target-rx_ring = NULL;
+   }
+   if (target-tx_ring) {
+   for (i = 0; i  target-queue_size; ++i)
+   srp_free_iu(target-srp_host, target-tx_ring[i]);
+   kfree(target-tx_ring);
+   target-tx_ring = NULL;
+   }
  }
  
  static void srp_path_rec_completion(int status,

@@ -564,7 +576,11 @@ static void srp_free_req_data(struct srp_target_port 
*target)
struct srp_request *req;
int i;
  
-	for (i = 0, req = target-req_ring; i  SRP_CMD_SQ_SIZE; ++i, ++req) {

+   if (!target-req_ring)
+   return;
+
+   for (i = 0; i  target-req_ring_size; ++i) {
+   req = target-req_ring[i];
kfree(req-fmr_list);
kfree(req-map_page);
if (req-indirect_dma_addr) {
@@ -574,6 +590,9 @@ static void srp_free_req_data(struct srp_target_port 
*target)
}
kfree(req-indirect_desc);
}
+
+   kfree(target-req_ring);
+   target-req_ring = NULL;
  }
  
  static int srp_alloc_req_data(struct srp_target_port *target)

@@ -586,7 +605,12 @@ static int srp_alloc_req_data(struct srp_target_port 
*target)
  
  	INIT_LIST_HEAD(target-free_reqs);
  
-	for (i = 0; i  SRP_CMD_SQ_SIZE; ++i) {

+   target-req_ring = kzalloc(target-req_ring_size *
+  sizeof(*target-req_ring), GFP_KERNEL);
+   if (!target-req_ring)
+   goto out;
+
+   for (i = 0; i  target-req_ring_size; ++i) {
req = target-req_ring[i];
req-fmr_list = kmalloc(target-cmd_sg_cnt * sizeof(void *),
GFP_KERNEL);
@@ -810,7 +834,7 @@ 

Re: [PATCH 8/8] IB/srp: Make queue size configurable

2013-08-21 Thread Sagi Grimberg

On 8/20/2013 8:43 PM, David Dillow wrote:

On Tue, 2013-08-20 at 17:55 +0200, Bart Van Assche wrote:

On 08/20/13 17:34, Sagi Grimberg wrote:

Question,
If srp now will allow larger queues while using a single global FMR pool
of size 1024, isn't it more likely now that in stress environment srp
will run out of FMRs to handle IO commands?
I mean that let's say that you have x scsi hosts with can_queue size of
512 (+-) and all of them are running IO stress, is it possible that all
FMRs will be inuse and no FMR is available to register the next IO SG-list?
Did you try out such a scenario?

I guess that in such a case IB core will return EAGAIN and SRP will
return SCSI_MLQUEUE_HOST_BUSY.
I think it is a good Idea to move FMR pools to be per connection rather
than a global pool, what do you think?

That makes sense to me. And as long as the above has not yet been
implemented I'm fine with dropping patch 8/8 from this patch set.

Don't drop it; most configs won't have all that many connections and
shouldn't have an issue; even those that do will only see a potential
slowdown when running with everything at once.

We can address the FMR/BMME issues on top of this patch.


Agree.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 2/9] IB/core: Introduce Signature Verbs API

2013-10-15 Thread Sagi Grimberg
This commit Introduces the Verbs Interface for signature related
operations. A signature handover operation shall configure the
layouts of data and protection attributes both in memory and wire
domains. Once the signature handover opration is done, the HCA will
offload data integrity generation/validation while performing
the actual data transfer.

Additions:
1. HCA signature capabilities in device attributes
Verbs provider supporting Signature handover operations shall
fill relevant fields in device attributes structure returned
by ib_query_device.

2. QP creation flag IB_QP_CREATE_SIGNATURE_EN
Creating QP that will carry signature handover operations
may require some special preperations from the verbs provider.
So we add QP creation flag IB_QP_CREATE_SIGNATURE_EN to declare
that the created QP may carry out signature handover operations.
Expose signature support to verbs layer (no support for now)

3. New send work request IB_WR_REG_SIG_MR
Signature handover work request. This WR will define
the signature handover properties of the memory/wire
domains as well as the domains layout.
* Currently expose just T10-DIF layout.

4. New Verb ib_check_sig_status
check_sig_status Verb shall check if any signature errors
are pending for a specific signature related ib_mr.
User should provide the ib_qp that executed the RDMA operation
involving the given ib_mr.

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/core/verbs.c |8 ++
 include/rdma/ib_verbs.h |  140 ++-
 2 files changed, 147 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 1d94a5c..5636d65 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1293,3 +1293,11 @@ int ib_dealloc_xrcd(struct ib_xrcd *xrcd)
return xrcd-device-dealloc_xrcd(xrcd);
 }
 EXPORT_SYMBOL(ib_dealloc_xrcd);
+
+int ib_check_sig_status(struct ib_mr *sig_mr,
+   struct ib_sig_err *sig_err)
+{
+   return sig_mr-device-check_sig_status ?
+   sig_mr-device-check_sig_status(sig_mr, sig_err) : -ENOSYS;
+}
+EXPORT_SYMBOL(ib_check_sig_status);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 65b7e79..cf46a83 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -116,7 +116,19 @@ enum ib_device_cap_flags {
IB_DEVICE_MEM_MGT_EXTENSIONS= (121),
IB_DEVICE_BLOCK_MULTICAST_LOOPBACK = (122),
IB_DEVICE_MEM_WINDOW_TYPE_2A= (123),
-   IB_DEVICE_MEM_WINDOW_TYPE_2B= (124)
+   IB_DEVICE_MEM_WINDOW_TYPE_2B= (124),
+   IB_DEVICE_SIGNATURE_HANDOVER= (125),
+};
+
+enum ib_signature_prot_cap {
+   IB_PROT_T10DIF_TYPE_1 = 1,
+   IB_PROT_T10DIF_TYPE_2 = 1  1,
+   IB_PROT_T10DIF_TYPE_3 = 1  2,
+};
+
+enum ib_signature_guard_cap {
+   IB_GUARD_T10DIF_CRC = 1,
+   IB_GUARD_T10DIF_CSUM= 1  1,
 };
 
 enum ib_atomic_cap {
@@ -166,6 +178,8 @@ struct ib_device_attr {
unsigned intmax_fast_reg_page_list_len;
u16 max_pkeys;
u8  local_ca_ack_delay;
+   enum ib_signature_prot_cap  sig_prot_cap;
+   enum ib_signature_guard_cap sig_guard_cap;
 };
 
 enum ib_mtu {
@@ -630,6 +644,7 @@ enum ib_qp_type {
 enum ib_qp_create_flags {
IB_QP_CREATE_IPOIB_UD_LSO   = 1  0,
IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK   = 1  1,
+   IB_QP_CREATE_SIGNATURE_EN   = 1  2,
/* reserve bits 26-31 for low level drivers' internal use */
IB_QP_CREATE_RESERVED_START = 1  26,
IB_QP_CREATE_RESERVED_END   = 1  31,
@@ -780,6 +795,7 @@ enum ib_wr_opcode {
IB_WR_MASKED_ATOMIC_CMP_AND_SWP,
IB_WR_MASKED_ATOMIC_FETCH_AND_ADD,
IB_WR_BIND_MW,
+   IB_WR_REG_SIG_MR,
/* reserve values for low level drivers' internal use.
 * These values will not be used at all in the ib core layer.
 */
@@ -885,6 +901,19 @@ struct ib_send_wr {
u32  rkey;
struct ib_mw_bind_info   bind_info;
} bind_mw;
+   struct {
+   struct ib_sig_attrs*sig_attrs;
+   struct ib_mr   *sig_mr;
+   int access_flags;
+   /* Registered data mr */
+   struct ib_mr   *data_mr;
+   u32 data_size;
+   u64 data_va;
+   /* Registered protection mr */
+   struct ib_mr   *prot_mr;
+   u32 prot_size;
+   u64 prot_va;
+   } sig_handover;
} wr

[PATCH RFC 4/9] IB/mlx5: Initialize mlx5_ib_qp signature related

2013-10-15 Thread Sagi Grimberg
If user requested signature enable we Initialize
relevant mlx5_ib_qp members. we mark the qp as sig_enable
we initiatlize empty sig_err_list, and we increase qp size.

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/hw/mlx5/mlx5_ib.h |5 +
 drivers/infiniband/hw/mlx5/qp.c  |7 +++
 include/linux/mlx5/qp.h  |1 +
 3 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 45d7424..1d5793e 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -189,6 +189,11 @@ struct mlx5_ib_qp {
 
int create_type;
u32 pa_lkey;
+
+   /* Store signature errors */
+   boolsignature_en;
+   struct list_headsig_err_list;
+   spinlock_t  sig_err_lock;
 };
 
 struct mlx5_ib_cq_buf {
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 045f8cd..9a8c622 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -734,6 +734,13 @@ static int create_qp_common(struct mlx5_ib_dev *dev, 
struct ib_pd *pd,
spin_lock_init(qp-sq.lock);
spin_lock_init(qp-rq.lock);
 
+   if (init_attr-create_flags == IB_QP_CREATE_SIGNATURE_EN) {
+   init_attr-cap.max_send_wr *= MLX5_SIGNATURE_SQ_MULT;
+   spin_lock_init(qp-sig_err_lock);
+   INIT_LIST_HEAD(qp-sig_err_list);
+   qp-signature_en = true;
+   }
+
if (init_attr-sq_sig_type == IB_SIGNAL_ALL_WR)
qp-sq_signal_bits = MLX5_WQE_CTRL_CQ_UPDATE;
 
diff --git a/include/linux/mlx5/qp.h b/include/linux/mlx5/qp.h
index d9e3eac..174805c 100644
--- a/include/linux/mlx5/qp.h
+++ b/include/linux/mlx5/qp.h
@@ -37,6 +37,7 @@
 #include linux/mlx5/driver.h
 
 #define MLX5_INVALID_LKEY  0x100
+#define MLX5_SIGNATURE_SQ_MULT 3
 
 enum mlx5_qp_optpar {
MLX5_QP_OPTPAR_ALT_ADDR_PATH= 1  0,
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 3/9] IB/mlx5, mlx5_core: Support for create_mr and destroy_mr

2013-10-15 Thread Sagi Grimberg
Support create_mr and destroy_mr verbs.
Creating ib_mr may be done for either ib_mr that will
register regular page lists like alloc_fast_reg_mr routine,
or indirect ib_mr's that can register other (pre-registered)
ib_mr's in an indirect manner.

In addition user may request signature enable, that will mean
that the created ib_mr may be attached with signature attributes
(BSF, PSVs).

Currently we only allow direct/indirect registration modes.

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/hw/mlx5/main.c|2 +
 drivers/infiniband/hw/mlx5/mlx5_ib.h |4 +
 drivers/infiniband/hw/mlx5/mr.c  |  120 ++
 drivers/net/ethernet/mellanox/mlx5/core/mr.c |   64 ++
 include/linux/mlx5/device.h  |   25 ++
 include/linux/mlx5/driver.h  |   22 +
 6 files changed, 237 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 3f831de..2e67a37 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1401,9 +1401,11 @@ static int init_one(struct pci_dev *pdev,
dev-ib_dev.get_dma_mr  = mlx5_ib_get_dma_mr;
dev-ib_dev.reg_user_mr = mlx5_ib_reg_user_mr;
dev-ib_dev.dereg_mr= mlx5_ib_dereg_mr;
+   dev-ib_dev.destroy_mr  = mlx5_ib_destroy_mr;
dev-ib_dev.attach_mcast= mlx5_ib_mcg_attach;
dev-ib_dev.detach_mcast= mlx5_ib_mcg_detach;
dev-ib_dev.process_mad = mlx5_ib_process_mad;
+   dev-ib_dev.create_mr   = mlx5_ib_create_mr;
dev-ib_dev.alloc_fast_reg_mr   = mlx5_ib_alloc_fast_reg_mr;
dev-ib_dev.alloc_fast_reg_page_list = mlx5_ib_alloc_fast_reg_page_list;
dev-ib_dev.free_fast_reg_page_list  = mlx5_ib_free_fast_reg_page_list;
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 836be91..45d7424 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -262,6 +262,7 @@ struct mlx5_ib_mr {
int npages;
struct completion   done;
enum ib_wc_status   status;
+   struct mlx5_core_sig_ctx*sig;
 };
 
 struct mlx5_ib_fast_reg_page_list {
@@ -489,6 +490,9 @@ struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 
start, u64 length,
  u64 virt_addr, int access_flags,
  struct ib_udata *udata);
 int mlx5_ib_dereg_mr(struct ib_mr *ibmr);
+int mlx5_ib_destroy_mr(struct ib_mr *ibmr);
+struct ib_mr *mlx5_ib_create_mr(struct ib_pd *pd,
+   struct ib_mr_init_attr *mr_init_attr);
 struct ib_mr *mlx5_ib_alloc_fast_reg_mr(struct ib_pd *pd,
int max_page_list_len);
 struct ib_fast_reg_page_list *mlx5_ib_alloc_fast_reg_page_list(struct 
ib_device *ibdev,
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index bd41df9..2f6758c 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -921,6 +921,126 @@ int mlx5_ib_dereg_mr(struct ib_mr *ibmr)
return 0;
 }
 
+struct ib_mr *mlx5_ib_create_mr(struct ib_pd *pd,
+   struct ib_mr_init_attr *mr_init_attr)
+{
+   struct mlx5_ib_dev *dev = to_mdev(pd-device);
+   struct mlx5_create_mkey_mbox_in *in;
+   struct mlx5_ib_mr *mr;
+   int access_mode, err;
+   int ndescs = roundup(mr_init_attr-max_reg_descriptors, 4);
+
+   mr = kzalloc(sizeof(*mr), GFP_KERNEL);
+   if (!mr)
+   return ERR_PTR(-ENOMEM);
+
+   in = kzalloc(sizeof(*in), GFP_KERNEL);
+   if (!in) {
+   err = -ENOMEM;
+   goto err_free;
+   }
+
+   in-seg.status = 1  6; /* free */
+   in-seg.xlt_oct_size = cpu_to_be32(ndescs);
+   in-seg.qpn_mkey7_0 = cpu_to_be32(0xff  8);
+   in-seg.flags_pd = cpu_to_be32(to_mpd(pd)-pdn);
+
+   switch (mr_init_attr-reg_type) {
+   case IB_MR_REG_DIRECT:
+   access_mode = MLX5_ACCESS_MODE_MTT;
+   break;
+   case IB_MR_REG_INDIRECT:
+   access_mode = MLX5_ACCESS_MODE_KLM;
+   break;
+   default:
+   err = -EINVAL;
+   goto err_free;
+   }
+   in-seg.flags = MLX5_PERM_UMR_EN | access_mode;
+
+   if (mr_init_attr-flags  IB_MR_SIGNATURE_EN) {
+   u32 psv_index[2];
+
+   in-seg.flags_pd = cpu_to_be32(be32_to_cpu(in-seg.flags_pd) |
+  MLX5_MKEY_BSF_EN);
+   in-seg.bsfs_octo_size = cpu_to_be32(MLX5_MKEY_BSF_OCTO_SIZE);
+   mr-sig = kzalloc(sizeof(*mr-sig), GFP_KERNEL);
+   if (!mr-sig) {
+   err = -ENOMEM;
+   goto err_free

[PATCH RFC 8/9] IB/mlx5: Collect signature error completion

2013-10-15 Thread Sagi Grimberg
This commit takes care of the generated signature
error cqe generated by the HW (if happened) and stores
it on the QP signature error list.

Once the user will get the completion for the transaction
he must check for signature errors on signature memory region
using a new lightweight verb ib_check_sig_status and if such
exsists, get the signature error information.

In case the user will not check for signature error, i.e.
call ib_check_sig_status, it will not be allowed to use
the memory region for another signature operation
(REG_SIG_MR work request will fail).

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/hw/mlx5/cq.c  |   49 ++
 drivers/infiniband/hw/mlx5/main.c|1 +
 drivers/infiniband/hw/mlx5/mlx5_ib.h |2 +
 drivers/infiniband/hw/mlx5/mr.c  |   34 +++
 drivers/infiniband/hw/mlx5/qp.c  |   14 +-
 include/linux/mlx5/cq.h  |1 +
 include/linux/mlx5/device.h  |   17 
 include/linux/mlx5/driver.h  |2 +
 8 files changed, 119 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index 344ab03..c1d4029 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -351,6 +351,34 @@ static void handle_atomics(struct mlx5_ib_qp *qp, struct 
mlx5_cqe64 *cqe64,
qp-sq.last_poll = tail;
 }
 
+static void get_sig_err_item(struct mlx5_sig_err_cqe *cqe,
+struct ib_sig_err *item)
+{
+   u16 syndrome = be16_to_cpu(cqe-syndrome);
+
+   switch (syndrome) {
+   case 13:
+   item-err_type = IB_SIG_BAD_CRC;
+   break;
+   case 12:
+   item-err_type = IB_SIG_BAD_APPTAG;
+   break;
+   case 11:
+   item-err_type = IB_SIG_BAD_REFTAG;
+   break;
+   default:
+   break;
+   }
+
+   item-expected_guard = be32_to_cpu(cqe-expected_trans_sig)  16;
+   item-actual_guard = be32_to_cpu(cqe-actual_trans_sig)  16;
+   item-expected_logical_block = be32_to_cpu(cqe-expected_reftag);
+   item-actual_logical_block = be32_to_cpu(cqe-actual_reftag);
+   item-sig_err_offset = be64_to_cpu(cqe-err_offset);
+   item-qpn = be32_to_cpu(cqe-qpn);
+   item-key = be32_to_cpu(cqe-mkey);
+}
+
 static int mlx5_poll_one(struct mlx5_ib_cq *cq,
 struct mlx5_ib_qp **cur_qp,
 struct ib_wc *wc)
@@ -360,12 +388,15 @@ static int mlx5_poll_one(struct mlx5_ib_cq *cq,
struct mlx5_cqe64 *cqe64;
struct mlx5_core_qp *mqp;
struct mlx5_ib_wq *wq;
+   struct mlx5_sig_err_cqe *sig_err_cqe;
+   struct ib_sig_err *err_item;
uint8_t opcode;
uint32_t qpn;
u16 wqe_ctr;
void *cqe;
int idx;
 
+repoll:
cqe = next_cqe_sw(cq);
if (!cqe)
return -EAGAIN;
@@ -449,6 +480,24 @@ static int mlx5_poll_one(struct mlx5_ib_cq *cq,
}
}
break;
+   case MLX5_CQE_SIG_ERR:
+   sig_err_cqe = (struct mlx5_sig_err_cqe *)cqe64;
+   err_item = kzalloc(sizeof(*err_item), GFP_ATOMIC);
+   if (!err_item) {
+   mlx5_ib_err(dev, Failed to allocate sig_err item\n);
+   return -ENOMEM;
+   }
+
+   get_sig_err_item(sig_err_cqe, err_item);
+
+   mlx5_ib_dbg(dev, Got SIGERR on key: 0x%x\n,
+   err_item-key);
+
+   spin_lock((*cur_qp)-sig_err_lock);
+   list_add(err_item-list, (*cur_qp)-sig_err_list);
+   spin_unlock((*cur_qp)-sig_err_lock);
+
+   goto repoll;
}
 
return 0;
diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 2e67a37..f3c7111 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1409,6 +1409,7 @@ static int init_one(struct pci_dev *pdev,
dev-ib_dev.alloc_fast_reg_mr   = mlx5_ib_alloc_fast_reg_mr;
dev-ib_dev.alloc_fast_reg_page_list = mlx5_ib_alloc_fast_reg_page_list;
dev-ib_dev.free_fast_reg_page_list  = mlx5_ib_free_fast_reg_page_list;
+   dev-ib_dev.check_sig_status= mlx5_ib_check_sig_status;
 
if (mdev-caps.flags  MLX5_DEV_CAP_FLAG_XRC) {
dev-ib_dev.alloc_xrcd = mlx5_ib_alloc_xrcd;
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 1d5793e..73b8cf0 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -533,6 +533,8 @@ int mlx5_mr_cache_init(struct mlx5_ib_dev *dev);
 int mlx5_mr_cache_cleanup(struct mlx5_ib_dev *dev);
 int mlx5_mr_ib_cont_pages(struct ib_umem *umem, u64 addr, int *count, int 
*shift);
 void mlx5_umr_cq_handler(struct ib_cq *cq, void *cq_context);
+int

[PATCH RFC 7/9] IB/mlx5: Support IB_WR_REG_SIG_MR

2013-10-15 Thread Sagi Grimberg
This patch implements IB_WR_REG_SIG_MR posted by the user.

Baisically this WR involvs 3 WQEs in order to prepare and properly
register the signature layout:

1. post UMR WR to register the sig_mr in one of two possible ways:
* In case the user registered a single MR for data so the UMR data segment
  consists of:
  - single klm (data MR) passed by the user
  - BSF with signature attributes requested by the user.
* In case the user registered 2 MRs, one for data and one for protection,
  the UMR consists of:
  - strided block format which includes data and protection MRs and
their repetitive block format.
  - BSF with signature attributes requested by the user.

2. post SET_PSV in order to set the for the memory domain initial
   signature parameters passed by the user.

3. post SET_PSV in order to set the for the wire domain initial
   signature parameters passed by the user.

This patch also introduces some helper functions to set the BSF correctly
and determining the signature format selectors.

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/hw/mlx5/qp.c |  413 +++
 include/linux/mlx5/qp.h |   56 ++
 2 files changed, 469 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 2517fb3..971d434 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -1721,6 +1721,26 @@ static __be64 frwr_mkey_mask(void)
return cpu_to_be64(result);
 }
 
+static __be64 sig_mkey_mask(void)
+{
+   u64 result;
+
+   result = MLX5_MKEY_MASK_LEN |
+   MLX5_MKEY_MASK_PAGE_SIZE|
+   MLX5_MKEY_MASK_START_ADDR   |
+   MLX5_MKEY_MASK_EN_RINVAL|
+   MLX5_MKEY_MASK_KEY  |
+   MLX5_MKEY_MASK_LR   |
+   MLX5_MKEY_MASK_LW   |
+   MLX5_MKEY_MASK_RR   |
+   MLX5_MKEY_MASK_RW   |
+   MLX5_MKEY_MASK_SMALL_FENCE  |
+   MLX5_MKEY_MASK_FREE |
+   MLX5_MKEY_MASK_BSF_EN;
+
+   return cpu_to_be64(result);
+}
+
 static void set_frwr_umr_segment(struct mlx5_wqe_umr_ctrl_seg *umr,
 struct ib_send_wr *wr, int li)
 {
@@ -1903,6 +1923,336 @@ static int set_data_inl_seg(struct mlx5_ib_qp *qp, 
struct ib_send_wr *wr,
return 0;
 }
 
+static u16 prot_field_size(enum ib_signature_type type, u16 block_size)
+{
+   switch (type) {
+   case IB_SIG_TYPE_T10_DIF:
+   return MLX5_DIF_SIZE;
+   default:
+   return 0;
+   }
+}
+
+static u8 bs_selector(int block_size)
+{
+   switch (block_size) {
+   case 512:   return 0x1;
+   case 520:   return 0x2;
+   case 4096:  return 0x3;
+   case 4160:  return 0x4;
+   case 1073741824:return 0x5;
+   default:return 0;
+   }
+}
+
+static int format_selector(struct ib_sig_attrs *attr,
+  struct ib_sig_domain *domain,
+  int *selector)
+{
+
+#define FORMAT_DIF_NONE0
+#define FORMAT_DIF_CRC_INC 4
+#define FORMAT_DIF_CSUM_INC12
+#define FORMAT_DIF_CRC_NO_INC  13
+#define FORMAT_DIF_CSUM_NO_INC 14
+
+   switch (domain-sig.dif.type) {
+   case IB_T10DIF_NONE:
+   /* No DIF */
+   *selector = FORMAT_DIF_NONE;
+   break;
+   case IB_T10DIF_TYPE1: /* Fall through */
+   case IB_T10DIF_TYPE2:
+   switch (domain-sig.dif.bg_type) {
+   case IB_T10DIF_CRC:
+   *selector = FORMAT_DIF_CRC_INC;
+   break;
+   case IB_T10DIF_CSUM:
+   *selector = FORMAT_DIF_CSUM_INC;
+   break;
+   default:
+   return 1;
+   }
+   break;
+   case IB_T10DIF_TYPE3:
+   switch (domain-sig.dif.bg_type) {
+   case IB_T10DIF_CRC:
+   *selector = domain-sig.dif.type3_inc_reftag ?
+  FORMAT_DIF_CRC_INC :
+  FORMAT_DIF_CRC_NO_INC;
+   break;
+   case IB_T10DIF_CSUM:
+   *selector = domain-sig.dif.type3_inc_reftag ?
+  FORMAT_DIF_CSUM_INC :
+  FORMAT_DIF_CSUM_NO_INC;
+   break;
+   default:
+   return 1;
+   }
+   break;
+   default:
+   return 1;
+   }
+
+   return 0;
+}
+
+static int mlx5_set_bsf(struct ib_mr *sig_mr,
+   struct ib_sig_attrs *sig_attrs,
+   struct mlx5_bsf

[PATCH RFC 1/9] IB/core: Introduce indirect and protected memory regions

2013-10-15 Thread Sagi Grimberg
This commit introduces verbs for creating memory regions
which will allow new types of memory key operations such
as indirect memory registration and protected memory
registration.

Indirect memory registration is registering several (one
of more) pre-registered memory regions in a specific layout.
The Indirect region may potentialy describe several regions
and some repitition format between them.

Protected Memory registration is registering a memory region
with various data integrity attributes that will describe protection
schemes that will be enforced by the HCA in an offloaded manner.

In the future these routines may replace current memory regions creation
routines existing today:
- ib_reg_user_mr
- ib_alloc_fast_reg_mr
- ib_get_dma_mr
- ib_dereg_mr

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/core/verbs.c |   39 +
 include/rdma/ib_verbs.h |   46 +++
 2 files changed, 85 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 22192de..1d94a5c 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1052,6 +1052,45 @@ int ib_dereg_mr(struct ib_mr *mr)
 }
 EXPORT_SYMBOL(ib_dereg_mr);
 
+struct ib_mr *ib_create_mr(struct ib_pd *pd,
+  struct ib_mr_init_attr *mr_init_attr)
+{
+   struct ib_mr *mr;
+
+   if (!pd-device-create_mr)
+   return ERR_PTR(-ENOSYS);
+
+   mr = pd-device-create_mr(pd, mr_init_attr);
+
+   if (!IS_ERR(mr)) {
+   mr-device  = pd-device;
+   mr-pd  = pd;
+   mr-uobject = NULL;
+   atomic_inc(pd-usecnt);
+   atomic_set(mr-usecnt, 0);
+   }
+
+   return mr;
+}
+EXPORT_SYMBOL(ib_create_mr);
+
+int ib_destroy_mr(struct ib_mr *mr)
+{
+   struct ib_pd *pd;
+   int ret;
+
+   if (atomic_read(mr-usecnt))
+   return -EBUSY;
+
+   pd = mr-pd;
+   ret = mr-device-destroy_mr(mr);
+   if (!ret)
+   atomic_dec(pd-usecnt);
+
+   return ret;
+}
+EXPORT_SYMBOL(ib_destroy_mr);
+
 struct ib_mr *ib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len)
 {
struct ib_mr *mr;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 645c3ce..65b7e79 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -925,6 +925,30 @@ enum ib_mr_rereg_flags {
IB_MR_REREG_ACCESS  = (12)
 };
 
+enum ib_mr_create_flags {
+   IB_MR_SIGNATURE_EN = 1,
+};
+
+enum ib_mr_reg_type {
+   IB_MR_REG_DIRECT,
+   IB_MR_REG_INDIRECT,
+};
+
+/**
+ * ib_mr_init_attr - Memory region init attributes passed to routine
+ * ib_create_mr.
+ * @reg_type: requested mapping type, this can be direct/indirect
+ *   registration or repetitive structure registration.
+ * @max_reg_descriptors: max number of registration units that
+ *   may be used with UMR work requests.
+ * @flags: MR creation flags bit mask.
+ */
+struct ib_mr_init_attr {
+   enum ib_mr_reg_type reg_type;
+   int max_reg_descriptors;
+   enum ib_mr_create_flags flags;
+};
+
 /**
  * struct ib_mw_bind - Parameters for a type 1 memory window bind operation.
  * @wr_id:  Work request id.
@@ -1257,6 +1281,9 @@ struct ib_device {
int(*query_mr)(struct ib_mr *mr,
   struct ib_mr_attr *mr_attr);
int(*dereg_mr)(struct ib_mr *mr);
+   int(*destroy_mr)(struct ib_mr *mr);
+   struct ib_mr * (*create_mr)(struct ib_pd *pd,
+   struct ib_mr_init_attr 
*mr_init_attr);
struct ib_mr * (*alloc_fast_reg_mr)(struct ib_pd *pd,
   int max_page_list_len);
struct ib_fast_reg_page_list * (*alloc_fast_reg_page_list)(struct 
ib_device *device,
@@ -2092,6 +2119,25 @@ int ib_query_mr(struct ib_mr *mr, struct ib_mr_attr 
*mr_attr);
  */
 int ib_dereg_mr(struct ib_mr *mr);
 
+
+/**
+ * ib_create_mr - creates memory region that may be used for
+ *   direct or indirect registration models via UMR WR.
+ * @pd: The protection domain associated with the region.
+ * @mr_init_attr: memory region init attributes.
+ */
+struct ib_mr *ib_create_mr(struct ib_pd *pd,
+  struct ib_mr_init_attr *mr_init_attr);
+
+/**
+ * ib_destroy_mr - Destroys a memory region that was created using
+ * ib_create_mr and removes it from HW translation tables.
+ * @mr: The memory region to destroy.
+ *
+ * This function can fail, if the memory region has memory windows bound to it.
+ */
+int ib_destroy_mr(struct ib_mr *mr);
+
 /**
  * ib_alloc_fast_reg_mr - Allocates memory region usable with the
  *   IB_WR_FAST_REG_MR send work request.
-- 
1.7.1

--
To unsubscribe from

[PATCH RFC 9/9] IB/mlx5: Publish support in signature feature

2013-10-15 Thread Sagi Grimberg
Currently support only T10-DIF types of signature
handover operations (typs 1|2|3).

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/hw/mlx5/main.c |9 +
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index f3c7111..3dd8219 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -274,6 +274,15 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
if (flags  MLX5_DEV_CAP_FLAG_XRC)
props-device_cap_flags |= IB_DEVICE_XRC;
props-device_cap_flags |= IB_DEVICE_MEM_MGT_EXTENSIONS;
+   if (flags  MLX5_DEV_CAP_FLAG_SIG_HAND_OVER) {
+   props-device_cap_flags |= IB_DEVICE_SIGNATURE_HANDOVER;
+   /* At this stage no support for signature handover */
+   props-sig_prot_cap = IB_PROT_T10DIF_TYPE_1 |
+ IB_PROT_T10DIF_TYPE_2 |
+ IB_PROT_T10DIF_TYPE_3;
+   props-sig_guard_cap = IB_GUARD_T10DIF_CRC |
+  IB_GUARD_T10DIF_CSUM;
+   }
 
props-vendor_id   = be32_to_cpup((__be32 *)(out_mad-data + 
36)) 
0xff;
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 0/9] Introduce Signature feature

2013-10-15 Thread Sagi Grimberg
This patchset Introduces Verbs level support for signature handover
feature. Siganture is intended to implement end-to-end data integrity
on a transactional basis in a completely offloaded manner.
A signature handover operation is basically a translation of
the data layout between the so called memory domain and wire domain
in the context of data integrity support.

There are several end-to-end data integrity methods used today in various
applications and/or upper layer protocols such as T10-DIF defined by SCSI
specifications (SBC), CRC32, XOR8 and more. This patchset adds verbs
support only for T10-DIF. The proposed framework allows adding more
signature methods.

The way that data integrity is performed is by registering a protected
region with siganture handover attributes and memory domain layout and
in addition define the wire domain layout. defining both domains is
equivalent to determining the signature hanover operation which can be
strip/add/pass and validate data integrity when performing data transfer
from input space and output space. When the data transfer is completed,
the user may check the signature status of the handover operation and
in case some data integrity error has occured receive a signature error
item providing the relevant info on the error.

This feature shall be used in storage upper layer protocols iSER/SRP
implementing end-to-end data integrity T10-DIF. Following this patchset,
we will soon submit krping activation code which will demonstrate
the usage and activation of protected RDMA transactions using signature verbs.

Patchset summary:
- Intoduce verbs for create/destroy memory regions supporting signature.
- Introduce IB core signature verbs API.
- Implement mr create/destroy verbs in mlx5 driver.
- Preperation patches for signature support in mlx5 driver.
- Implement signature handover work request in mlx5 driver.
- Implement signature error collection and handling in mlx5 driver.

Sagi Grimberg (9):
  IB/core: Introduce indirect and protected memory regions
  IB/core: Introduce Signature Verbs API
  IB/mlx5, mlx5_core: Support for create_mr and destroy_mr
  IB/mlx5: Initialize mlx5_ib_qp signature related
  IB/mlx5: Break wqe handling to begin  finish routines
  IB/mlx5: remove MTT access mode from umr flags helper function
  IB/mlx5: Support IB_WR_REG_SIG_MR
  IB/mlx5: Collect signature error completion
  IB/mlx5: Publish support in signature feature

 drivers/infiniband/core/verbs.c  |   47 +++
 drivers/infiniband/hw/mlx5/cq.c  |   49 +++
 drivers/infiniband/hw/mlx5/main.c|   12 +
 drivers/infiniband/hw/mlx5/mlx5_ib.h |   11 +
 drivers/infiniband/hw/mlx5/mr.c  |  154 
 drivers/infiniband/hw/mlx5/qp.c  |  532 --
 drivers/net/ethernet/mellanox/mlx5/core/mr.c |   64 +++
 include/linux/mlx5/cq.h  |1 +
 include/linux/mlx5/device.h  |   42 ++
 include/linux/mlx5/driver.h  |   24 ++
 include/linux/mlx5/qp.h  |   57 +++
 include/rdma/ib_verbs.h  |  186 +-
 12 files changed, 1140 insertions(+), 39 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 5/9] IB/mlx5: Break wqe handling to begin finish routines

2013-10-15 Thread Sagi Grimberg
As a preliminary step for signature feature which will
reuqire posting multiple (3) WQEs for a single WR, we
break post_send routine WQE indexing into begin and
finish routines.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/hw/mlx5/qp.c |   95 ---
 1 files changed, 59 insertions(+), 36 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 9a8c622..57733a5 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -1985,6 +1985,57 @@ static u8 get_fence(u8 fence, struct ib_send_wr *wr)
}
 }
 
+static int begin_wqe(struct mlx5_ib_qp *qp, void **seg,
+struct mlx5_wqe_ctrl_seg **ctrl,
+struct ib_send_wr *wr, int *idx,
+int *size, int nreq)
+{
+   int err = 0;
+   if (unlikely(mlx5_wq_overflow(qp-sq, nreq, qp-ibqp.send_cq))) {
+   err = -ENOMEM;
+   return err;
+   }
+
+   *idx = qp-sq.cur_post  (qp-sq.wqe_cnt - 1);
+   *seg = mlx5_get_send_wqe(qp, *idx);
+   *ctrl = *seg;
+   *(uint32_t *)(*seg + 8) = 0;
+   (*ctrl)-imm = send_ieth(wr);
+   (*ctrl)-fm_ce_se = qp-sq_signal_bits |
+   (wr-send_flags  IB_SEND_SIGNALED ?
+MLX5_WQE_CTRL_CQ_UPDATE : 0) |
+   (wr-send_flags  IB_SEND_SOLICITED ?
+MLX5_WQE_CTRL_SOLICITED : 0);
+
+   *seg += sizeof(**ctrl);
+   *size = sizeof(**ctrl) / 16;
+
+   return err;
+}
+
+static void finish_wqe(struct mlx5_ib_qp *qp,
+  struct mlx5_wqe_ctrl_seg *ctrl,
+  u8 size, unsigned idx, u64 wr_id,
+  int *nreq, u8 fence, u8 next_fence,
+  u32 mlx5_opcode)
+{
+   u8 opmod = 0;
+   ctrl-opmod_idx_opcode = cpu_to_be32(((u32)(qp-sq.cur_post)  8) |
+mlx5_opcode | ((u32)opmod  24));
+   ctrl-qpn_ds = cpu_to_be32(size | (qp-mqp.qpn  8));
+   ctrl-fm_ce_se |= fence;
+   qp-fm_cache = next_fence;
+   if (unlikely(qp-wq_sig))
+   ctrl-signature = wq_sig(ctrl);
+
+   qp-sq.wrid[idx] = wr_id;
+   qp-sq.w_list[idx].opcode = mlx5_opcode;
+   qp-sq.wqe_head[idx] = qp-sq.head + (*nreq)++;
+   qp-sq.cur_post += DIV_ROUND_UP(size * 16, MLX5_SEND_WQE_BB);
+   qp-sq.w_list[idx].next = qp-sq.cur_post;
+}
+
+
 int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
  struct ib_send_wr **bad_wr)
 {
@@ -1998,7 +2049,6 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct 
ib_send_wr *wr,
int uninitialized_var(size);
void *qend = qp-sq.qend;
unsigned long flags;
-   u32 mlx5_opcode;
unsigned idx;
int err = 0;
int inl = 0;
@@ -2007,7 +2057,6 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct 
ib_send_wr *wr,
int nreq;
int i;
u8 next_fence = 0;
-   u8 opmod = 0;
u8 fence;
 
spin_lock_irqsave(qp-sq.lock, flags);
@@ -2020,36 +2069,23 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct 
ib_send_wr *wr,
goto out;
}
 
-   if (unlikely(mlx5_wq_overflow(qp-sq, nreq, 
qp-ibqp.send_cq))) {
+   fence = qp-fm_cache;
+   num_sge = wr-num_sge;
+   if (unlikely(num_sge  qp-sq.max_gs)) {
mlx5_ib_warn(dev, \n);
err = -ENOMEM;
*bad_wr = wr;
goto out;
}
 
-   fence = qp-fm_cache;
-   num_sge = wr-num_sge;
-   if (unlikely(num_sge  qp-sq.max_gs)) {
+   err = begin_wqe(qp, seg, ctrl, wr, idx, size, nreq);
+   if (err) {
mlx5_ib_warn(dev, \n);
err = -ENOMEM;
*bad_wr = wr;
goto out;
}
 
-   idx = qp-sq.cur_post  (qp-sq.wqe_cnt - 1);
-   seg = mlx5_get_send_wqe(qp, idx);
-   ctrl = seg;
-   *(uint32_t *)(seg + 8) = 0;
-   ctrl-imm = send_ieth(wr);
-   ctrl-fm_ce_se = qp-sq_signal_bits |
-   (wr-send_flags  IB_SEND_SIGNALED ?
-MLX5_WQE_CTRL_CQ_UPDATE : 0) |
-   (wr-send_flags  IB_SEND_SOLICITED ?
-MLX5_WQE_CTRL_SOLICITED : 0);
-
-   seg += sizeof(*ctrl);
-   size = sizeof(*ctrl) / 16;
-
switch (ibqp-qp_type) {
case IB_QPT_XRC_INI:
xrc = seg;
@@ -2199,22 +2235,9 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct 
ib_send_wr *wr,
}
}
 
-   mlx5_opcode = mlx5_ib_opcode[wr-opcode];
-   ctrl-opmod_idx_opcode

[PATCH RFC 6/9] IB/mlx5: remove MTT access mode from umr flags helper function

2013-10-15 Thread Sagi Grimberg
get_umr_flags helper function might be used for types
of access modes other than ACCESS_MODE_MTT, such as
ACCESS_MODE_KLM. so remove it from helper and caller
will add it's own access mode flag.

This commit does not add/change functionality.

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/hw/mlx5/qp.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 57733a5..2517fb3 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -1775,7 +1775,7 @@ static u8 get_umr_flags(int acc)
   (acc  IB_ACCESS_REMOTE_WRITE  ? MLX5_PERM_REMOTE_WRITE : 0) |
   (acc  IB_ACCESS_REMOTE_READ   ? MLX5_PERM_REMOTE_READ  : 0) |
   (acc  IB_ACCESS_LOCAL_WRITE   ? MLX5_PERM_LOCAL_WRITE  : 0) |
-   MLX5_PERM_LOCAL_READ | MLX5_PERM_UMR_EN | MLX5_ACCESS_MODE_MTT;
+   MLX5_PERM_LOCAL_READ | MLX5_PERM_UMR_EN;
 }
 
 static void set_mkey_segment(struct mlx5_mkey_seg *seg, struct ib_send_wr *wr,
@@ -1787,7 +1787,8 @@ static void set_mkey_segment(struct mlx5_mkey_seg *seg, 
struct ib_send_wr *wr,
return;
}
 
-   seg-flags = get_umr_flags(wr-wr.fast_reg.access_flags);
+   seg-flags = get_umr_flags(wr-wr.fast_reg.access_flags) |
+MLX5_ACCESS_MODE_MTT;
*writ = seg-flags  (MLX5_PERM_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE);
seg-qpn_mkey7_0 = cpu_to_be32((wr-wr.fast_reg.rkey  0xff) | 
0xff00);
seg-flags_pd = cpu_to_be32(MLX5_MKEY_REMOTE_INVAL);
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/9] IB/core: Introduce Signature Verbs API

2013-10-20 Thread Sagi Grimberg

On 10/18/2013 1:51 AM, Hefty, Sean wrote:

@@ -885,6 +901,19 @@ struct ib_send_wr {
u32  rkey;
struct ib_mw_bind_info   bind_info;
} bind_mw;
+   struct {
+   struct ib_sig_attrs*sig_attrs;
+   struct ib_mr   *sig_mr;
+   int access_flags;
+   /* Registered data mr */
+   struct ib_mr   *data_mr;
+   u32 data_size;
+   u64 data_va;
+   /* Registered protection mr */
+   struct ib_mr   *prot_mr;
+   u32 prot_size;
+   u64 prot_va;
+   } sig_handover;

At what point do we admit that this is a ridiculous structure?


If you are referring to ib_send_wr, I agree, shall we modify it to a 
union of typedefs so it becomes more readable?



Help me understand what this WR is doing.  Is this telling the HCA to copy data 
between local MRs?  What is a 'data MR' versus a 'protected MR'?  (I'm not hip 
on T10-DIF.)


No data copy, god forbids... :)

Let me start by giving a short intro on signature (and T10-DIF).
In signature world, data may exist with protection information which is 
the guarding the data. In T10-DIF (Data Integrity Fields) for example, 
these are 8-byte guards which includes CRC for each 512 bytes of data 
(block).
An HCA which support signature offload, is expected to validate that the 
data is intact (each block matches its guard) and send it correctly over 
the wire (in T10-DIF case the data and protection should be interleaved 
i.e. 512B of data followed by 8B of protection guard) or alternatively, 
validate data (+ protection) coming from the wire and write it to the 
associated memory areas.
In the general case, the data and the protection guards may lie in 
different memory areas. SCSI mid-layer for instance, passes the 
transport driver 2 buffers using 2 scatterlists.
The transport driver (or application in the general case), is expected 
to register each buffer (as it normally would in order to use RDMA) 
using 2 MRs.


The signature handover operation is binding all the necessary 
information for the HCA together: where is the data (data_mr), where is 
the protection information (prot_mr), what are the signature properties 
(sig_attrs).
Once this step is taken (WR is posted), a single MR (sig_mr) describes 
the signature handover operation and can be used to perform RDMA under 
signature presence.
Once the HCA will perform RDMA over this MR, it will take into account 
the signature context of the transaction and will follow the signature 
attributes configured.


Hope this helps,

Sagi.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 1/9] IB/core: Introduce indirect and protected memory regions

2013-10-20 Thread Sagi Grimberg

On 10/18/2013 1:43 AM, Hefty, Sean wrote:

This commit introduces verbs for creating memory regions
which will allow new types of memory key operations such
as indirect memory registration and protected memory
registration.

Indirect memory registration is registering several (one
of more) pre-registered memory regions in a specific layout.
The Indirect region may potentialy describe several regions
and some repitition format between them.

I didn't follow this direct versus indirect difference.  See below.
  


Hey Sean, thanks for looking into this!

Indirect memory registration feature will be submitted in the future. 
Signature feature is using it under the hood.
I'll remove it from v2 as it creates a source of confusion and I want to 
concentrate on signature.


Now since you opened this door, briefly,
unlike direct (known) MRs which are associated with a page-list, 
indirect MRs can be associated with other MRs in the form of a list of 
tuples {lkey, addr, len} providing more flexible memory registrations.



+struct ib_mr *ib_create_mr(struct ib_pd *pd,
+  struct ib_mr_init_attr *mr_init_attr)
+{
+   struct ib_mr *mr;
+
+   if (!pd-device-create_mr)
+   return ERR_PTR(-ENOSYS);
+
+   mr = pd-device-create_mr(pd, mr_init_attr);
+
+   if (!IS_ERR(mr)) {
+   mr-device  = pd-device;
+   mr-pd  = pd;
+   mr-uobject = NULL;
+   atomic_inc(pd-usecnt);
+   atomic_set(mr-usecnt, 0);
+   }
+
+   return mr;
+}
+EXPORT_SYMBOL(ib_create_mr);
+
+int ib_destroy_mr(struct ib_mr *mr)
+{
+   struct ib_pd *pd;
+   int ret;
+
+   if (atomic_read(mr-usecnt))
+   return -EBUSY;
+
+   pd = mr-pd;
+   ret = mr-device-destroy_mr(mr);
+   if (!ret)
+   atomic_dec(pd-usecnt);
+
+   return ret;
+}
+EXPORT_SYMBOL(ib_destroy_mr);
+
  struct ib_mr *ib_alloc_fast_reg_mr(struct ib_pd *pd, int
max_page_list_len)
  {
struct ib_mr *mr;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 645c3ce..65b7e79 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -925,6 +925,30 @@ enum ib_mr_rereg_flags {
IB_MR_REREG_ACCESS  = (12)
  };

+enum ib_mr_create_flags {
+   IB_MR_SIGNATURE_EN = 1,
+};
+
+enum ib_mr_reg_type {
+   IB_MR_REG_DIRECT,
+   IB_MR_REG_INDIRECT,
+};
+
+/**
+ * ib_mr_init_attr - Memory region init attributes passed to routine
+ * ib_create_mr.
+ * @reg_type: requested mapping type, this can be direct/indirect
+ *   registration or repetitive structure registration.
+ * @max_reg_descriptors: max number of registration units that
+ *   may be used with UMR work requests.
+ * @flags: MR creation flags bit mask.
+ */
+struct ib_mr_init_attr {
+   enum ib_mr_reg_type reg_type;
+   int max_reg_descriptors;
+   enum ib_mr_create_flags flags;
+};
+
  /**
   * struct ib_mw_bind - Parameters for a type 1 memory window bind
operation.
   * @wr_id:  Work request id.
@@ -1257,6 +1281,9 @@ struct ib_device {
int(*query_mr)(struct ib_mr *mr,
   struct ib_mr_attr *mr_attr);
int(*dereg_mr)(struct ib_mr *mr);
+   int(*destroy_mr)(struct ib_mr *mr);
+   struct ib_mr * (*create_mr)(struct ib_pd *pd,
+   struct ib_mr_init_attr 
*mr_init_attr);

These create and destroy something called an 'MR', but are not actually 
associated with any memory buffers.  Is this some sort of conceptual 
sub-protection domain?  Why is this needed, versus defining new ib_mr_attr 
fields?


This MR can be perceived as a generalization of fast_reg MR. When using 
fast memory registration the verbs user will call ib_alloc_fast_reg_mr() 
in order to allocate an MR that may be used for fast registration method 
by posting
a fast registration work-request on the send-queue (FRWR). The user does 
not pass any memory buffers to ib_alloc_fast_reg_mr() as the actual 
registration is done via posting WR. This follows the same notation, but 
allows new functionality (such as signature enable).


As things are today, No MR creation method (fast_reg, dma, phys, 
user...) allows passing initialization parameters. signature feature 
requires some internal resources management, and we need some kind of 
indication that signature is requested for this MR. I'm suggesting that 
this verb will cover the general case and later on, it is possible to 
extend this method to cover all existing flavors of MR creation 
(implement the existing ones with it).


Do you agree? or do you prefer to extend other MR allocation methods to 
receive initialization parameters?



- Sean


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More 

Re: [PATCH RFC 2/9] IB/core: Introduce Signature Verbs API

2013-10-21 Thread Sagi Grimberg

On 10/21/2013 5:34 PM, Hefty, Sean wrote:

The signature handover operation is binding all the necessary
information for the HCA together: where is the data (data_mr), where is
the protection information (prot_mr), what are the signature properties
(sig_attrs).
Once this step is taken (WR is posted), a single MR (sig_mr) describes
the signature handover operation and can be used to perform RDMA under
signature presence.
Once the HCA will perform RDMA over this MR, it will take into account
the signature context of the transaction and will follow the signature
attributes configured.

It seems like this changes loses the ability to use an SGL.


I don't think so,
Signature MR simply describes a signature associated memory region 
i.e. it is a memory region that
also defines some signature operation offload aside from normal RDMA 
(for example validate  strip).
SGL are used to publish several rkeys for the server/target/peer to 
perform RDMA on each.
In this case the user previously registered each MR which he wishes it's 
peer to RDMA over.
Same story here, if user has several signature associated MRs, where he 
wish his peer to RDMA over (in a protected manner),

he can use these rkeys to construct SGL.


   Why are the signature properties separate from the protection information?


Well,
Protection information is the actual protection block guards of the data 
(i.e. CRCs, XORs, DIFs etc..), while the signature properties
structure is the descriptor telling the HCA how to 
treat/validate/generate the protection information.


Note that signature support requires the HCA to be able to support 
INSERT operations.
This means that there is no protection information and the HCA is asked 
to generate it and add it to the data stream

(which may be incoming or outgoing...),

Hope this helps.

Sagi.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/9] IB/core: Introduce Signature Verbs API

2013-10-23 Thread Sagi Grimberg

On 10/22/2013 9:20 PM, Hefty, Sean wrote:

Would we lose anything making this a new operation for the QP, versus

trying to hook it into the existing ib_post_send call?

If I understand correctly you are suggesting making it a verb? Well this
operation is a fast-path operation - so I guess we will loose it in this
case.
Take SCSI for example, for each IO operation submitted by SCSI
mid-layer, transport layer should perform any protection policy that
SCSI asked for.
  From this point of view, signature operation resembles fast
registration (since the transport does not own the IOP data buffers, so
it uses fast registration methods).
That is why we are hooking into ib_post_send.

I'm suggesting multiple calls that can post to the send queue, rather than one 
call that does a giant switch statement at the beginning based on the opcode.


Although I understand where you are coming from, We also lose in this case.
If we go down this rode, we block the user from saving a HW doorbell by 
concatenating signature and RDMA WRs to a post list.

I assume this is why fast_reg is also an extension of ib_post_send.


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC v1 00/10] Introduce Signature feature

2013-10-28 Thread Sagi Grimberg
This patchset Introduces Verbs level support for signature handover
feature. Siganture is intended to implement end-to-end data integrity
on a transactional basis in a completely offloaded manner.

There are several end-to-end data integrity methods used today in various
applications and/or upper layer protocols such as T10-DIF defined by SCSI
specifications (SBC), CRC32, XOR8 and more. This patchset adds verbs
support only for T10-DIF. The proposed framework allows adding more
signature methods in the future.

In T10-DIF, when a series of 512-byte data blocks are transferred, each
block is followed by an 8-byte guard. The guard consists of CRC that
protects the integrity of the data in the block, and some other tags
that protects against mis-directed IOs.

Data can be protected when transferred over the wire, but can also be
protected in the memory of the sender/receiver. This allows true end-
to-end protection against bits flipping either over the wire, through
gateways, in memory, over PCI, etc.

While T10-DIF clearly defines that over the wire protection guards are
interleaved into the data stream (each 512-Byte block followed by 8-byte
guard), when in memory, the protection guards may reside in a buffer
separated from the data. Depending on the application, it is usually
easier to handle the data when it is contiguous. In this case the data
buffer will be of size 512xN and the protection buffer will be of size
8xN (where N is the number of blocks in the transaction).

There are 3 kinds of signature handover operation:
1. Take unprotected data (from wire or memory) and ADD protection
   guards.
2. Take protetected data (from wire or memory), validate the data
   integrity against the protection guards and STRIP the protection
   guards.
3. Take protected data (from wire or memory), validate the data
   integrity against the protection guards and PASS the data with
   the guards as-is.

This translates to defining to the HCA how/if data protection exists
in memory domain, and how/if data protection exists is wire domain.

The way that data integrity is performed is by using a new kind of
memory region: signature-enabled MR, and a new kind of work request:
REG_SIG_MR. The REG_SIG_MR WR operates on the signature-enabled MR,
and defines all the needed information for the signature handover
(data buffer, protection buffer if needed and signature attributes).
The result is an MR that can be used for data transfer as usual,
that will also add/validate/strip/pass protection guards.

When the data transfer is successfully completed, it does not mean
that there are no integrity errors. The user must afterwards check
the signature status of the handover operation using a new light-weight
verb.

This feature shall be used in storage upper layer protocols iSER/SRP
implementing end-to-end data integrity T10-DIF. Following this patchset,
we will soon submit krping patches which will demonstrate the usage of
these signature verbs.

Patchset summary:
- Intoduce verbs for create/destroy memory regions supporting signature.
- Introduce IB core signature verbs API.
- Implement mr create/destroy verbs in mlx5 driver.
- Preperation patches for signature support in mlx5 driver.
- Implement signature handover work request in mlx5 driver.
- Implement signature error collection and handling in mlx5 driver.

Changes from v0:
- Commit messages: Added more detailed explanation for signature work request.
- IB/core: Remove indirect memory registration enablement from create_mr.
   Keep only signature enablement.
- IB/mlx5: Changed signature error processing via MR radix lookup.

Sagi Grimberg (10):
  IB/core: Introduce protected memory regions
  IB/core: Introduce Signature Verbs API
  IB/mlx5, mlx5_core: Support for create_mr and destroy_mr
  IB/mlx5: Initialize mlx5_ib_qp signature related
  IB/mlx5: Break wqe handling to begin  finish routines
  IB/mlx5: remove MTT access mode from umr flags helper function
  IB/mlx5: Keep mlx5 MRs in a radix tree under device
  IB/mlx5: Support IB_WR_REG_SIG_MR
  IB/mlx5: Collect signature error completion
  IB/mlx5: Publish support in signature feature

 drivers/infiniband/core/verbs.c|   47 +++
 drivers/infiniband/hw/mlx5/cq.c|   53 +++
 drivers/infiniband/hw/mlx5/main.c  |   12 +
 drivers/infiniband/hw/mlx5/mlx5_ib.h   |   14 +
 drivers/infiniband/hw/mlx5/mr.c|  138 +++
 drivers/infiniband/hw/mlx5/qp.c|  525 ++--
 drivers/net/ethernet/mellanox/mlx5/core/main.c |1 +
 drivers/net/ethernet/mellanox/mlx5/core/mr.c   |   84 
 include/linux/mlx5/cq.h|1 +
 include/linux/mlx5/device.h|   43 ++
 include/linux/mlx5/driver.h|   35 ++
 include/linux/mlx5/qp.h|   62 +++
 include/rdma/ib_verbs.h|  172 -
 13 files changed, 1148 insertions(+), 39 deletions

[PATCH RFC v1 09/10] IB/mlx5: Collect signature error completion

2013-10-28 Thread Sagi Grimberg
This commit takes care of the generated signature
error cqe generated by the HW (if happened) and stores
it on the QP signature error list.

Once the user will get the completion for the transaction
he must check for signature errors on signature memory region
using a new lightweight verb ib_check_sig_status and if such
exsists, he will get the signature error information.

In case the user will not check for signature error, i.e.
won't call ib_check_sig_status, it will not be allowed to
use the memory region for another signature operation
(REG_SIG_MR work request will fail).

The underlying mlx5 will handle signature error completions
and will mark the relevant memory region as dirty.

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/hw/mlx5/cq.c  |   53 ++
 drivers/infiniband/hw/mlx5/main.c|1 +
 drivers/infiniband/hw/mlx5/mlx5_ib.h |7 
 drivers/infiniband/hw/mlx5/mr.c  |   29 ++
 drivers/infiniband/hw/mlx5/qp.c  |8 -
 include/linux/mlx5/cq.h  |1 +
 include/linux/mlx5/device.h  |   18 +++
 include/linux/mlx5/driver.h  |4 ++
 include/linux/mlx5/qp.h  |5 +++
 9 files changed, 124 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index 344ab03..da7605b 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -351,6 +351,33 @@ static void handle_atomics(struct mlx5_ib_qp *qp, struct 
mlx5_cqe64 *cqe64,
qp-sq.last_poll = tail;
 }
 
+static void get_sig_err_item(struct mlx5_sig_err_cqe *cqe,
+struct ib_sig_err *item)
+{
+   u16 syndrome = be16_to_cpu(cqe-syndrome);
+
+   switch (syndrome) {
+   case 13:
+   item-err_type = IB_SIG_BAD_CRC;
+   break;
+   case 12:
+   item-err_type = IB_SIG_BAD_APPTAG;
+   break;
+   case 11:
+   item-err_type = IB_SIG_BAD_REFTAG;
+   break;
+   default:
+   break;
+   }
+
+   item-expected_guard = be32_to_cpu(cqe-expected_trans_sig)  16;
+   item-actual_guard = be32_to_cpu(cqe-actual_trans_sig)  16;
+   item-expected_logical_block = be32_to_cpu(cqe-expected_reftag);
+   item-actual_logical_block = be32_to_cpu(cqe-actual_reftag);
+   item-sig_err_offset = be64_to_cpu(cqe-err_offset);
+   item-key = be32_to_cpu(cqe-mkey);
+}
+
 static int mlx5_poll_one(struct mlx5_ib_cq *cq,
 struct mlx5_ib_qp **cur_qp,
 struct ib_wc *wc)
@@ -360,12 +387,16 @@ static int mlx5_poll_one(struct mlx5_ib_cq *cq,
struct mlx5_cqe64 *cqe64;
struct mlx5_core_qp *mqp;
struct mlx5_ib_wq *wq;
+   struct mlx5_sig_err_cqe *sig_err_cqe;
+   struct mlx5_core_mr *mmr;
+   struct mlx5_ib_mr *mr;
uint8_t opcode;
uint32_t qpn;
u16 wqe_ctr;
void *cqe;
int idx;
 
+repoll:
cqe = next_cqe_sw(cq);
if (!cqe)
return -EAGAIN;
@@ -449,6 +480,28 @@ static int mlx5_poll_one(struct mlx5_ib_cq *cq,
}
}
break;
+   case MLX5_CQE_SIG_ERR:
+   sig_err_cqe = (struct mlx5_sig_err_cqe *)cqe64;
+
+   read_lock(dev-mdev.priv.mr_table.lock);
+   mmr = __mlx5_mr_lookup(dev-mdev,
+  be32_to_cpu(sig_err_cqe-mkey)  
0xff00);
+   if (unlikely(!mmr)) {
+   mlx5_ib_warn(dev, CQE@CQ %06x for unknown MR %6x\n,
+cq-mcq.cqn, 
be32_to_cpu(sig_err_cqe-mkey));
+   return -EINVAL;
+   }
+   read_unlock(dev-mdev.priv.mr_table.lock);
+
+   mr = to_mibmr(mmr);
+
+   get_sig_err_item(sig_err_cqe, mr-sig-err_item);
+   mr-sig-sig_err_exists = true;
+
+   mlx5_ib_dbg(dev, Got SIGERR on key: 0x%x\n,
+   mr-sig-err_item.key);
+
+   goto repoll;
}
 
return 0;
diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 2e67a37..f3c7111 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1409,6 +1409,7 @@ static int init_one(struct pci_dev *pdev,
dev-ib_dev.alloc_fast_reg_mr   = mlx5_ib_alloc_fast_reg_mr;
dev-ib_dev.alloc_fast_reg_page_list = mlx5_ib_alloc_fast_reg_page_list;
dev-ib_dev.free_fast_reg_page_list  = mlx5_ib_free_fast_reg_page_list;
+   dev-ib_dev.check_sig_status= mlx5_ib_check_sig_status;
 
if (mdev-caps.flags  MLX5_DEV_CAP_FLAG_XRC) {
dev-ib_dev.alloc_xrcd = mlx5_ib_alloc_xrcd;
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 758f0e1..f175fa4 100644
--- a/drivers

[PATCH RFC v1 06/10] IB/mlx5: remove MTT access mode from umr flags helper function

2013-10-28 Thread Sagi Grimberg
get_umr_flags helper function might be used for types
of access modes other than ACCESS_MODE_MTT, such as
ACCESS_MODE_KLM. so remove it from helper and caller
will add it's own access mode flag.

This patch does not add/change functionality.

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/hw/mlx5/qp.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index dc8d9fc..ca78078 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -1773,7 +1773,7 @@ static u8 get_umr_flags(int acc)
   (acc  IB_ACCESS_REMOTE_WRITE  ? MLX5_PERM_REMOTE_WRITE : 0) |
   (acc  IB_ACCESS_REMOTE_READ   ? MLX5_PERM_REMOTE_READ  : 0) |
   (acc  IB_ACCESS_LOCAL_WRITE   ? MLX5_PERM_LOCAL_WRITE  : 0) |
-   MLX5_PERM_LOCAL_READ | MLX5_PERM_UMR_EN | MLX5_ACCESS_MODE_MTT;
+   MLX5_PERM_LOCAL_READ | MLX5_PERM_UMR_EN;
 }
 
 static void set_mkey_segment(struct mlx5_mkey_seg *seg, struct ib_send_wr *wr,
@@ -1785,7 +1785,8 @@ static void set_mkey_segment(struct mlx5_mkey_seg *seg, 
struct ib_send_wr *wr,
return;
}
 
-   seg-flags = get_umr_flags(wr-wr.fast_reg.access_flags);
+   seg-flags = get_umr_flags(wr-wr.fast_reg.access_flags) |
+MLX5_ACCESS_MODE_MTT;
*writ = seg-flags  (MLX5_PERM_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE);
seg-qpn_mkey7_0 = cpu_to_be32((wr-wr.fast_reg.rkey  0xff) | 
0xff00);
seg-flags_pd = cpu_to_be32(MLX5_MKEY_REMOTE_INVAL);
-- 
1.7.8.2

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC v1 05/10] IB/mlx5: Break wqe handling to begin finish routines

2013-10-28 Thread Sagi Grimberg
As a preliminary step for signature feature which will
reuqire posting multiple (3) WQEs for a single WR, we
break post_send routine WQE indexing into begin and
finish routines.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/hw/mlx5/qp.c |   95 ---
 1 files changed, 59 insertions(+), 36 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index c80122e..dc8d9fc 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -1983,6 +1983,57 @@ static u8 get_fence(u8 fence, struct ib_send_wr *wr)
}
 }
 
+static int begin_wqe(struct mlx5_ib_qp *qp, void **seg,
+struct mlx5_wqe_ctrl_seg **ctrl,
+struct ib_send_wr *wr, int *idx,
+int *size, int nreq)
+{
+   int err = 0;
+   if (unlikely(mlx5_wq_overflow(qp-sq, nreq, qp-ibqp.send_cq))) {
+   err = -ENOMEM;
+   return err;
+   }
+
+   *idx = qp-sq.cur_post  (qp-sq.wqe_cnt - 1);
+   *seg = mlx5_get_send_wqe(qp, *idx);
+   *ctrl = *seg;
+   *(uint32_t *)(*seg + 8) = 0;
+   (*ctrl)-imm = send_ieth(wr);
+   (*ctrl)-fm_ce_se = qp-sq_signal_bits |
+   (wr-send_flags  IB_SEND_SIGNALED ?
+MLX5_WQE_CTRL_CQ_UPDATE : 0) |
+   (wr-send_flags  IB_SEND_SOLICITED ?
+MLX5_WQE_CTRL_SOLICITED : 0);
+
+   *seg += sizeof(**ctrl);
+   *size = sizeof(**ctrl) / 16;
+
+   return err;
+}
+
+static void finish_wqe(struct mlx5_ib_qp *qp,
+  struct mlx5_wqe_ctrl_seg *ctrl,
+  u8 size, unsigned idx, u64 wr_id,
+  int *nreq, u8 fence, u8 next_fence,
+  u32 mlx5_opcode)
+{
+   u8 opmod = 0;
+   ctrl-opmod_idx_opcode = cpu_to_be32(((u32)(qp-sq.cur_post)  8) |
+mlx5_opcode | ((u32)opmod  24));
+   ctrl-qpn_ds = cpu_to_be32(size | (qp-mqp.qpn  8));
+   ctrl-fm_ce_se |= fence;
+   qp-fm_cache = next_fence;
+   if (unlikely(qp-wq_sig))
+   ctrl-signature = wq_sig(ctrl);
+
+   qp-sq.wrid[idx] = wr_id;
+   qp-sq.w_list[idx].opcode = mlx5_opcode;
+   qp-sq.wqe_head[idx] = qp-sq.head + (*nreq)++;
+   qp-sq.cur_post += DIV_ROUND_UP(size * 16, MLX5_SEND_WQE_BB);
+   qp-sq.w_list[idx].next = qp-sq.cur_post;
+}
+
+
 int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
  struct ib_send_wr **bad_wr)
 {
@@ -1996,7 +2047,6 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct 
ib_send_wr *wr,
int uninitialized_var(size);
void *qend = qp-sq.qend;
unsigned long flags;
-   u32 mlx5_opcode;
unsigned idx;
int err = 0;
int inl = 0;
@@ -2005,7 +2055,6 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct 
ib_send_wr *wr,
int nreq;
int i;
u8 next_fence = 0;
-   u8 opmod = 0;
u8 fence;
 
spin_lock_irqsave(qp-sq.lock, flags);
@@ -2018,36 +2067,23 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct 
ib_send_wr *wr,
goto out;
}
 
-   if (unlikely(mlx5_wq_overflow(qp-sq, nreq, 
qp-ibqp.send_cq))) {
+   fence = qp-fm_cache;
+   num_sge = wr-num_sge;
+   if (unlikely(num_sge  qp-sq.max_gs)) {
mlx5_ib_warn(dev, \n);
err = -ENOMEM;
*bad_wr = wr;
goto out;
}
 
-   fence = qp-fm_cache;
-   num_sge = wr-num_sge;
-   if (unlikely(num_sge  qp-sq.max_gs)) {
+   err = begin_wqe(qp, seg, ctrl, wr, idx, size, nreq);
+   if (err) {
mlx5_ib_warn(dev, \n);
err = -ENOMEM;
*bad_wr = wr;
goto out;
}
 
-   idx = qp-sq.cur_post  (qp-sq.wqe_cnt - 1);
-   seg = mlx5_get_send_wqe(qp, idx);
-   ctrl = seg;
-   *(uint32_t *)(seg + 8) = 0;
-   ctrl-imm = send_ieth(wr);
-   ctrl-fm_ce_se = qp-sq_signal_bits |
-   (wr-send_flags  IB_SEND_SIGNALED ?
-MLX5_WQE_CTRL_CQ_UPDATE : 0) |
-   (wr-send_flags  IB_SEND_SOLICITED ?
-MLX5_WQE_CTRL_SOLICITED : 0);
-
-   seg += sizeof(*ctrl);
-   size = sizeof(*ctrl) / 16;
-
switch (ibqp-qp_type) {
case IB_QPT_XRC_INI:
xrc = seg;
@@ -2197,22 +2233,9 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct 
ib_send_wr *wr,
}
}
 
-   mlx5_opcode = mlx5_ib_opcode[wr-opcode];
-   ctrl-opmod_idx_opcode

[PATCH RFC v1 04/10] IB/mlx5: Initialize mlx5_ib_qp signature related

2013-10-28 Thread Sagi Grimberg
If user requested signature enable we Initialize
relevant mlx5_ib_qp members. we mark the qp as sig_enable
we initiatlize empty sig_err_list, and we increase qp size.

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/hw/mlx5/mlx5_ib.h |3 +++
 drivers/infiniband/hw/mlx5/qp.c  |5 +
 include/linux/mlx5/qp.h  |1 +
 3 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 45d7424..758f0e1 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -189,6 +189,9 @@ struct mlx5_ib_qp {
 
int create_type;
u32 pa_lkey;
+
+   /* Store signature errors */
+   boolsignature_en;
 };
 
 struct mlx5_ib_cq_buf {
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 045f8cdb..c80122e 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -734,6 +734,11 @@ static int create_qp_common(struct mlx5_ib_dev *dev, 
struct ib_pd *pd,
spin_lock_init(qp-sq.lock);
spin_lock_init(qp-rq.lock);
 
+   if (init_attr-create_flags == IB_QP_CREATE_SIGNATURE_EN) {
+   init_attr-cap.max_send_wr *= MLX5_SIGNATURE_SQ_MULT;
+   qp-signature_en = true;
+   }
+
if (init_attr-sq_sig_type == IB_SIGNAL_ALL_WR)
qp-sq_signal_bits = MLX5_WQE_CTRL_CQ_UPDATE;
 
diff --git a/include/linux/mlx5/qp.h b/include/linux/mlx5/qp.h
index d9e3eac..174805c 100644
--- a/include/linux/mlx5/qp.h
+++ b/include/linux/mlx5/qp.h
@@ -37,6 +37,7 @@
 #include linux/mlx5/driver.h
 
 #define MLX5_INVALID_LKEY  0x100
+#define MLX5_SIGNATURE_SQ_MULT 3
 
 enum mlx5_qp_optpar {
MLX5_QP_OPTPAR_ALT_ADDR_PATH= 1  0,
-- 
1.7.8.2

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC v1 02/10] IB/core: Introduce Signature Verbs API

2013-10-28 Thread Sagi Grimberg
This commit Introduces the Verbs Interface for signature related
operations. A signature handover operation shall configure the
layouts of data and protection attributes both in memory and wire
domains.

Signature operations are:
- INSERT
  Generate and insert protection information when handing over
  data from input space to output space.
- vaildate and STRIP:
  Validate protection information and remove it when handing over
  data from input space to output space.
- validate and PASS:
  Validate protection information and pass it when handing over
  data from input space to output space.

Once the signature handover opration is done, the HCA will
offload data integrity generation/validation while performing
the actual data transfer.

Additions:
1. HCA signature capabilities in device attributes
Verbs provider supporting Signature handover operations shall
fill relevant fields in device attributes structure returned
by ib_query_device.

2. QP creation flag IB_QP_CREATE_SIGNATURE_EN
Creating QP that will carry signature handover operations
may require some special preperations from the verbs provider.
So we add QP creation flag IB_QP_CREATE_SIGNATURE_EN to declare
that the created QP may carry out signature handover operations.
Expose signature support to verbs layer (no support for now)

3. New send work request IB_WR_REG_SIG_MR
Signature handover work request. This WR will define the signature
handover properties of the memory/wire domains as well as the domains
layout. The purpose of this work request is to bind all the needed
information for the signature operation:
- data to be transferred:  data_mr, data_va, data_size.
  * The raw data, pre-registered to a single MR (normally, before
signature, this MR would have been used directly for the data
transfer)
- data protection guards: prot_mr, prot_va, prot_size.
  * The data protection buffer, pre-registered to a single MR, which
contains the data integrity guards of the raw data blocks.
Note that it may not always exist, only in cases where the user is
interested in storing protection guards in memory.
- signature operation attributes: sig_attrs.
  * Tells the HCA how to validate/generate the protection information.

Once the work request is executed, the memory region which
will describe the signature transaction will be the sig_mr. The
application can now go ahead and send the sig_mr.rkey or use the 
sig_mr.lkey for data transfer.

4. New Verb ib_check_sig_status
check_sig_status Verb shall check if any signature errors
are pending for a specific signature-enabled ib_mr.
This Verb is a lightwight check and is allowed to be taken
from interrupt context. Application must call this verb after
it is known that the actual data transfer has finished.

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/core/verbs.c |8 +++
 include/rdma/ib_verbs.h |  134 ++-
 2 files changed, 141 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 1d94a5c..5636d65 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1293,3 +1293,11 @@ int ib_dealloc_xrcd(struct ib_xrcd *xrcd)
return xrcd-device-dealloc_xrcd(xrcd);
 }
 EXPORT_SYMBOL(ib_dealloc_xrcd);
+
+int ib_check_sig_status(struct ib_mr *sig_mr,
+   struct ib_sig_err *sig_err)
+{
+   return sig_mr-device-check_sig_status ?
+   sig_mr-device-check_sig_status(sig_mr, sig_err) : -ENOSYS;
+}
+EXPORT_SYMBOL(ib_check_sig_status);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 56f7e88..233f66d 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -116,7 +116,19 @@ enum ib_device_cap_flags {
IB_DEVICE_MEM_MGT_EXTENSIONS= (121),
IB_DEVICE_BLOCK_MULTICAST_LOOPBACK = (122),
IB_DEVICE_MEM_WINDOW_TYPE_2A= (123),
-   IB_DEVICE_MEM_WINDOW_TYPE_2B= (124)
+   IB_DEVICE_MEM_WINDOW_TYPE_2B= (124),
+   IB_DEVICE_SIGNATURE_HANDOVER= (125),
+};
+
+enum ib_signature_prot_cap {
+   IB_PROT_T10DIF_TYPE_1 = 1,
+   IB_PROT_T10DIF_TYPE_2 = 1  1,
+   IB_PROT_T10DIF_TYPE_3 = 1  2,
+};
+
+enum ib_signature_guard_cap {
+   IB_GUARD_T10DIF_CRC = 1,
+   IB_GUARD_T10DIF_CSUM= 1  1,
 };
 
 enum ib_atomic_cap {
@@ -166,6 +178,8 @@ struct ib_device_attr {
unsigned intmax_fast_reg_page_list_len;
u16 max_pkeys;
u8  local_ca_ack_delay;
+   enum ib_signature_prot_cap  sig_prot_cap;
+   enum ib_signature_guard_cap sig_guard_cap;
 };
 
 enum ib_mtu {
@@ -630,6 +644,7 @@ enum ib_qp_type {
 enum ib_qp_create_flags {
IB_QP_CREATE_IPOIB_UD_LSO   = 1  0

[PATCH RFC v1 10/10] IB/mlx5: Publish support in signature feature

2013-10-28 Thread Sagi Grimberg
Currently support only T10-DIF types of signature
handover operations (typs 1|2|3).

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/hw/mlx5/main.c |9 +
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index f3c7111..3dd8219 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -274,6 +274,15 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
if (flags  MLX5_DEV_CAP_FLAG_XRC)
props-device_cap_flags |= IB_DEVICE_XRC;
props-device_cap_flags |= IB_DEVICE_MEM_MGT_EXTENSIONS;
+   if (flags  MLX5_DEV_CAP_FLAG_SIG_HAND_OVER) {
+   props-device_cap_flags |= IB_DEVICE_SIGNATURE_HANDOVER;
+   /* At this stage no support for signature handover */
+   props-sig_prot_cap = IB_PROT_T10DIF_TYPE_1 |
+ IB_PROT_T10DIF_TYPE_2 |
+ IB_PROT_T10DIF_TYPE_3;
+   props-sig_guard_cap = IB_GUARD_T10DIF_CRC |
+  IB_GUARD_T10DIF_CSUM;
+   }
 
props-vendor_id   = be32_to_cpup((__be32 *)(out_mad-data + 
36)) 
0xff;
-- 
1.7.8.2

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC v1 08/10] IB/mlx5: Support IB_WR_REG_SIG_MR

2013-10-28 Thread Sagi Grimberg
This patch implements IB_WR_REG_SIG_MR posted by the user.

Baisically this WR involvs 3 WQEs in order to prepare and properly
register the signature layout:

1. post UMR WR to register the sig_mr in one of two possible ways:
* In case the user registered a single MR for data so the UMR data segment
  consists of:
  - single klm (data MR) passed by the user
  - BSF with signature attributes requested by the user.
* In case the user registered 2 MRs, one for data and one for protection,
  the UMR consists of:
  - strided block format which includes data and protection MRs and
their repetitive block format.
  - BSF with signature attributes requested by the user.

2. post SET_PSV in order to set the for the memory domain initial
   signature parameters passed by the user.

3. post SET_PSV in order to set the for the wire domain initial
   signature parameters passed by the user.

This patch also introduces some helper functions to set the BSF correctly
and determining the signature format selectors.

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/hw/mlx5/qp.c |  416 +++
 include/linux/mlx5/qp.h |   56 ++
 2 files changed, 472 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index ca78078..d791e41 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -1719,6 +1719,26 @@ static __be64 frwr_mkey_mask(void)
return cpu_to_be64(result);
 }
 
+static __be64 sig_mkey_mask(void)
+{
+   u64 result;
+
+   result = MLX5_MKEY_MASK_LEN |
+   MLX5_MKEY_MASK_PAGE_SIZE|
+   MLX5_MKEY_MASK_START_ADDR   |
+   MLX5_MKEY_MASK_EN_RINVAL|
+   MLX5_MKEY_MASK_KEY  |
+   MLX5_MKEY_MASK_LR   |
+   MLX5_MKEY_MASK_LW   |
+   MLX5_MKEY_MASK_RR   |
+   MLX5_MKEY_MASK_RW   |
+   MLX5_MKEY_MASK_SMALL_FENCE  |
+   MLX5_MKEY_MASK_FREE |
+   MLX5_MKEY_MASK_BSF_EN;
+
+   return cpu_to_be64(result);
+}
+
 static void set_frwr_umr_segment(struct mlx5_wqe_umr_ctrl_seg *umr,
 struct ib_send_wr *wr, int li)
 {
@@ -1901,6 +1921,339 @@ static int set_data_inl_seg(struct mlx5_ib_qp *qp, 
struct ib_send_wr *wr,
return 0;
 }
 
+static u16 prot_field_size(enum ib_signature_type type, u16 block_size)
+{
+   switch (type) {
+   case IB_SIG_TYPE_T10_DIF:
+   return MLX5_DIF_SIZE;
+   default:
+   return 0;
+   }
+}
+
+static u8 bs_selector(int block_size)
+{
+   switch (block_size) {
+   case 512:   return 0x1;
+   case 520:   return 0x2;
+   case 4096:  return 0x3;
+   case 4160:  return 0x4;
+   case 1073741824:return 0x5;
+   default:return 0;
+   }
+}
+
+static int format_selector(struct ib_sig_attrs *attr,
+  struct ib_sig_domain *domain,
+  int *selector)
+{
+
+#define FORMAT_DIF_NONE0
+#define FORMAT_DIF_CRC_INC 4
+#define FORMAT_DIF_CSUM_INC12
+#define FORMAT_DIF_CRC_NO_INC  13
+#define FORMAT_DIF_CSUM_NO_INC 14
+
+   switch (domain-sig.dif.type) {
+   case IB_T10DIF_NONE:
+   /* No DIF */
+   *selector = FORMAT_DIF_NONE;
+   break;
+   case IB_T10DIF_TYPE1: /* Fall through */
+   case IB_T10DIF_TYPE2:
+   switch (domain-sig.dif.bg_type) {
+   case IB_T10DIF_CRC:
+   *selector = FORMAT_DIF_CRC_INC;
+   break;
+   case IB_T10DIF_CSUM:
+   *selector = FORMAT_DIF_CSUM_INC;
+   break;
+   default:
+   return 1;
+   }
+   break;
+   case IB_T10DIF_TYPE3:
+   switch (domain-sig.dif.bg_type) {
+   case IB_T10DIF_CRC:
+   *selector = domain-sig.dif.type3_inc_reftag ?
+  FORMAT_DIF_CRC_INC :
+  FORMAT_DIF_CRC_NO_INC;
+   break;
+   case IB_T10DIF_CSUM:
+   *selector = domain-sig.dif.type3_inc_reftag ?
+  FORMAT_DIF_CSUM_INC :
+  FORMAT_DIF_CSUM_NO_INC;
+   break;
+   default:
+   return 1;
+   }
+   break;
+   default:
+   return 1;
+   }
+
+   return 0;
+}
+
+static int mlx5_set_bsf(struct ib_mr *sig_mr,
+   struct ib_sig_attrs *sig_attrs,
+   struct mlx5_bsf

[PATCH RFC v1 07/10] IB/mlx5: Keep mlx5 MRs in a radix tree under device

2013-10-28 Thread Sagi Grimberg
This will be useful when processing signature errors
on a specific key. The mlx5 driver will lookup the
matching mlx5 memory region structure and mark it as
dirty (contains signature errors).

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c |1 +
 drivers/net/ethernet/mellanox/mlx5/core/mr.c   |   20 
 include/linux/mlx5/driver.h|   12 
 3 files changed, 33 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index b47739b..5b7b3c7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -428,6 +428,7 @@ int mlx5_dev_init(struct mlx5_core_dev *dev, struct pci_dev 
*pdev)
mlx5_init_cq_table(dev);
mlx5_init_qp_table(dev);
mlx5_init_srq_table(dev);
+   mlx5_init_mr_table(dev);
 
return 0;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mr.c 
b/drivers/net/ethernet/mellanox/mlx5/core/mr.c
index 2ade604..f72e0b6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mr.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mr.c
@@ -36,9 +36,18 @@
 #include linux/mlx5/cmd.h
 #include mlx5_core.h
 
+void mlx5_init_mr_table(struct mlx5_core_dev *dev)
+{
+   struct mlx5_mr_table *table = dev-priv.mr_table;
+
+   rwlock_init(table-lock);
+   INIT_RADIX_TREE(table-tree, GFP_ATOMIC);
+}
+
 int mlx5_core_create_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr,
  struct mlx5_create_mkey_mbox_in *in, int inlen)
 {
+   struct mlx5_mr_table *table = dev-priv.mr_table;
struct mlx5_create_mkey_mbox_out out;
int err;
u8 key;
@@ -63,14 +72,21 @@ int mlx5_core_create_mkey(struct mlx5_core_dev *dev, struct 
mlx5_core_mr *mr,
mr-key = mlx5_idx_to_mkey(be32_to_cpu(out.mkey)  0xff) | key;
mlx5_core_dbg(dev, out 0x%x, key 0x%x, mkey 0x%x\n, 
be32_to_cpu(out.mkey), key, mr-key);
 
+   /* connect to MR tree */
+   write_lock_irq(table-lock);
+   err = radix_tree_insert(table-tree, mr-key  0xff00, mr);
+   write_unlock_irq(table-lock);
+
return err;
 }
 EXPORT_SYMBOL(mlx5_core_create_mkey);
 
 int mlx5_core_destroy_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr)
 {
+   struct mlx5_mr_table *table = dev-priv.mr_table;
struct mlx5_destroy_mkey_mbox_in in;
struct mlx5_destroy_mkey_mbox_out out;
+   unsigned long flags;
int err;
 
memset(in, 0, sizeof(in));
@@ -85,6 +101,10 @@ int mlx5_core_destroy_mkey(struct mlx5_core_dev *dev, 
struct mlx5_core_mr *mr)
if (out.hdr.status)
return mlx5_cmd_status_to_err(out.hdr);
 
+   write_lock_irqsave(table-lock, flags);
+   radix_tree_delete(table-tree, mr-key  0xff00);
+   write_unlock_irqrestore(table-lock, flags);
+
return err;
 }
 EXPORT_SYMBOL(mlx5_core_destroy_mkey);
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 7c33487..5fe0690 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -488,6 +488,13 @@ struct mlx5_srq_table {
struct radix_tree_root  tree;
 };
 
+struct mlx5_mr_table {
+   /* protect radix tree
+*/
+   rwlock_tlock;
+   struct radix_tree_root  tree;
+};
+
 struct mlx5_priv {
charname[MLX5_MAX_NAME_LEN];
struct mlx5_eq_tableeq_table;
@@ -516,6 +523,10 @@ struct mlx5_priv {
struct mlx5_cq_tablecq_table;
/* end: cq staff */
 
+   /* start: mr staff */
+   struct mlx5_mr_tablemr_table;
+   /* end: mr staff */
+
/* start: alloc staff */
struct mutexpgdir_mutex;
struct list_headpgdir_list;
@@ -691,6 +702,7 @@ int mlx5_core_query_srq(struct mlx5_core_dev *dev, struct 
mlx5_core_srq *srq,
struct mlx5_query_srq_mbox_out *out);
 int mlx5_core_arm_srq(struct mlx5_core_dev *dev, struct mlx5_core_srq *srq,
  u16 lwm, int is_srq);
+void mlx5_init_mr_table(struct mlx5_core_dev *dev);
 int mlx5_core_create_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr,
  struct mlx5_create_mkey_mbox_in *in, int inlen);
 int mlx5_core_destroy_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr);
-- 
1.7.8.2

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC v1 03/10] IB/mlx5, mlx5_core: Support for create_mr and destroy_mr

2013-10-28 Thread Sagi Grimberg
Support create_mr and destroy_mr verbs.
The user may request signature enable memory region attribute
where in this case the memory region shall be indirect MR
and shall be attached to with signature attributes (BSF, PSVs).
Otherwise, the create_mr routine is equivalent to alloc_fast_reg_mr.

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/hw/mlx5/main.c|2 +
 drivers/infiniband/hw/mlx5/mlx5_ib.h |4 +
 drivers/infiniband/hw/mlx5/mr.c  |  109 ++
 drivers/net/ethernet/mellanox/mlx5/core/mr.c |   64 +++
 include/linux/mlx5/device.h  |   25 ++
 include/linux/mlx5/driver.h  |   19 +
 6 files changed, 223 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 3f831de..2e67a37 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1401,9 +1401,11 @@ static int init_one(struct pci_dev *pdev,
dev-ib_dev.get_dma_mr  = mlx5_ib_get_dma_mr;
dev-ib_dev.reg_user_mr = mlx5_ib_reg_user_mr;
dev-ib_dev.dereg_mr= mlx5_ib_dereg_mr;
+   dev-ib_dev.destroy_mr  = mlx5_ib_destroy_mr;
dev-ib_dev.attach_mcast= mlx5_ib_mcg_attach;
dev-ib_dev.detach_mcast= mlx5_ib_mcg_detach;
dev-ib_dev.process_mad = mlx5_ib_process_mad;
+   dev-ib_dev.create_mr   = mlx5_ib_create_mr;
dev-ib_dev.alloc_fast_reg_mr   = mlx5_ib_alloc_fast_reg_mr;
dev-ib_dev.alloc_fast_reg_page_list = mlx5_ib_alloc_fast_reg_page_list;
dev-ib_dev.free_fast_reg_page_list  = mlx5_ib_free_fast_reg_page_list;
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 836be91..45d7424 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -262,6 +262,7 @@ struct mlx5_ib_mr {
int npages;
struct completion   done;
enum ib_wc_status   status;
+   struct mlx5_core_sig_ctx*sig;
 };
 
 struct mlx5_ib_fast_reg_page_list {
@@ -489,6 +490,9 @@ struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 
start, u64 length,
  u64 virt_addr, int access_flags,
  struct ib_udata *udata);
 int mlx5_ib_dereg_mr(struct ib_mr *ibmr);
+int mlx5_ib_destroy_mr(struct ib_mr *ibmr);
+struct ib_mr *mlx5_ib_create_mr(struct ib_pd *pd,
+   struct ib_mr_init_attr *mr_init_attr);
 struct ib_mr *mlx5_ib_alloc_fast_reg_mr(struct ib_pd *pd,
int max_page_list_len);
 struct ib_fast_reg_page_list *mlx5_ib_alloc_fast_reg_page_list(struct 
ib_device *ibdev,
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index bd41df9..44f7e46 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -921,6 +921,115 @@ int mlx5_ib_dereg_mr(struct ib_mr *ibmr)
return 0;
 }
 
+struct ib_mr *mlx5_ib_create_mr(struct ib_pd *pd,
+   struct ib_mr_init_attr *mr_init_attr)
+{
+   struct mlx5_ib_dev *dev = to_mdev(pd-device);
+   struct mlx5_create_mkey_mbox_in *in;
+   struct mlx5_ib_mr *mr;
+   int access_mode, err;
+   int ndescs = roundup(mr_init_attr-max_reg_descriptors, 4);
+
+   mr = kzalloc(sizeof(*mr), GFP_KERNEL);
+   if (!mr)
+   return ERR_PTR(-ENOMEM);
+
+   in = kzalloc(sizeof(*in), GFP_KERNEL);
+   if (!in) {
+   err = -ENOMEM;
+   goto err_free;
+   }
+
+   in-seg.status = 1  6; /* free */
+   in-seg.xlt_oct_size = cpu_to_be32(ndescs);
+   in-seg.qpn_mkey7_0 = cpu_to_be32(0xff  8);
+   in-seg.flags_pd = cpu_to_be32(to_mpd(pd)-pdn);
+   access_mode = MLX5_ACCESS_MODE_MTT;
+
+   if (mr_init_attr-flags  IB_MR_SIGNATURE_EN) {
+   u32 psv_index[2];
+
+   in-seg.flags_pd = cpu_to_be32(be32_to_cpu(in-seg.flags_pd) |
+  MLX5_MKEY_BSF_EN);
+   in-seg.bsfs_octo_size = cpu_to_be32(MLX5_MKEY_BSF_OCTO_SIZE);
+   mr-sig = kzalloc(sizeof(*mr-sig), GFP_KERNEL);
+   if (!mr-sig) {
+   err = -ENOMEM;
+   goto err_free;
+   }
+
+   /* create mem  wire PSVs */
+   err = mlx5_core_create_psv(dev-mdev, to_mpd(pd)-pdn,
+  2, psv_index);
+   if (err)
+   goto err_free_sig;
+
+   access_mode = MLX5_ACCESS_MODE_KLM;
+   mr-sig-psv_memory.psv_idx = psv_index[0];
+   mr-sig-psv_wire.psv_idx = psv_index[1];
+   }
+
+   in-seg.flags = MLX5_PERM_UMR_EN | access_mode;
+   err = mlx5_core_create_mkey(dev-mdev

[PATCH RFC v1 01/10] IB/core: Introduce protected memory regions

2013-10-28 Thread Sagi Grimberg
This commit introduces verbs for creating/destoying memory
regions which will allow new types of memory key operations such
as protected memory registration.

Indirect memory registration is registering several (one
of more) pre-registered memory regions in a specific layout.
The Indirect region may potentialy describe several regions
and some repitition format between them.

Protected Memory registration is registering a memory region
with various data integrity attributes which will describe protection
schemes that will be handled by the HCA in an offloaded manner.
These memory regions will be applicable for a new REG_SIG_MR
work request introduced later in this patchset.

In the future these routines may replace or implement current memory
regions creation routines existing today:
- ib_reg_user_mr
- ib_alloc_fast_reg_mr
- ib_get_dma_mr
- ib_dereg_mr

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/core/verbs.c |   39 +++
 include/rdma/ib_verbs.h |   38 ++
 2 files changed, 77 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 22192de..1d94a5c 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1052,6 +1052,45 @@ int ib_dereg_mr(struct ib_mr *mr)
 }
 EXPORT_SYMBOL(ib_dereg_mr);
 
+struct ib_mr *ib_create_mr(struct ib_pd *pd,
+  struct ib_mr_init_attr *mr_init_attr)
+{
+   struct ib_mr *mr;
+
+   if (!pd-device-create_mr)
+   return ERR_PTR(-ENOSYS);
+
+   mr = pd-device-create_mr(pd, mr_init_attr);
+
+   if (!IS_ERR(mr)) {
+   mr-device  = pd-device;
+   mr-pd  = pd;
+   mr-uobject = NULL;
+   atomic_inc(pd-usecnt);
+   atomic_set(mr-usecnt, 0);
+   }
+
+   return mr;
+}
+EXPORT_SYMBOL(ib_create_mr);
+
+int ib_destroy_mr(struct ib_mr *mr)
+{
+   struct ib_pd *pd;
+   int ret;
+
+   if (atomic_read(mr-usecnt))
+   return -EBUSY;
+
+   pd = mr-pd;
+   ret = mr-device-destroy_mr(mr);
+   if (!ret)
+   atomic_dec(pd-usecnt);
+
+   return ret;
+}
+EXPORT_SYMBOL(ib_destroy_mr);
+
 struct ib_mr *ib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len)
 {
struct ib_mr *mr;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 645c3ce..56f7e88 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -925,6 +925,22 @@ enum ib_mr_rereg_flags {
IB_MR_REREG_ACCESS  = (12)
 };
 
+enum ib_mr_create_flags {
+   IB_MR_SIGNATURE_EN = 1,
+};
+
+/**
+ * ib_mr_init_attr - Memory region init attributes passed to routine
+ * ib_create_mr.
+ * @max_reg_descriptors: max number of registration units that
+ *   may be used with UMR work requests.
+ * @flags: MR creation flags bit mask.
+ */
+struct ib_mr_init_attr {
+   int max_reg_descriptors;
+   enum ib_mr_create_flags flags;
+};
+
 /**
  * struct ib_mw_bind - Parameters for a type 1 memory window bind operation.
  * @wr_id:  Work request id.
@@ -1257,6 +1273,9 @@ struct ib_device {
int(*query_mr)(struct ib_mr *mr,
   struct ib_mr_attr *mr_attr);
int(*dereg_mr)(struct ib_mr *mr);
+   int(*destroy_mr)(struct ib_mr *mr);
+   struct ib_mr * (*create_mr)(struct ib_pd *pd,
+   struct ib_mr_init_attr 
*mr_init_attr);
struct ib_mr * (*alloc_fast_reg_mr)(struct ib_pd *pd,
   int max_page_list_len);
struct ib_fast_reg_page_list * (*alloc_fast_reg_page_list)(struct 
ib_device *device,
@@ -2092,6 +2111,25 @@ int ib_query_mr(struct ib_mr *mr, struct ib_mr_attr 
*mr_attr);
  */
 int ib_dereg_mr(struct ib_mr *mr);
 
+
+/**
+ * ib_create_mr - creates memory region that may be used for
+ *   direct or indirect registration models via UMR WR.
+ * @pd: The protection domain associated with the region.
+ * @mr_init_attr: memory region init attributes.
+ */
+struct ib_mr *ib_create_mr(struct ib_pd *pd,
+  struct ib_mr_init_attr *mr_init_attr);
+
+/**
+ * ib_destroy_mr - Destroys a memory region that was created using
+ * ib_create_mr and removes it from HW translation tables.
+ * @mr: The memory region to destroy.
+ *
+ * This function can fail, if the memory region has memory windows bound to it.
+ */
+int ib_destroy_mr(struct ib_mr *mr);
+
 /**
  * ib_alloc_fast_reg_mr - Allocates memory region usable with the
  *   IB_WR_FAST_REG_MR send work request.
-- 
1.7.8.2

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http

Re: [PATCH RFC v1 01/10] IB/core: Introduce protected memory regions

2013-10-29 Thread Sagi Grimberg

On 10/28/2013 11:22 PM, Hefty, Sean wrote:

+enum ib_mr_create_flags {
+   IB_MR_SIGNATURE_EN = 1,
+};
+
+/**
+ * ib_mr_init_attr - Memory region init attributes passed to routine
+ * ib_create_mr.
+ * @max_reg_descriptors: max number of registration units that
+ *   may be used with UMR work requests.
+ * @flags: MR creation flags bit mask.
+ */
+struct ib_mr_init_attr {
+   int max_reg_descriptors;
+   enum ib_mr_create_flags flags;

Assuming that flags will be a bitwise OR of values, they should be an int, not 
an enum.


Right, will fix. The same applies to signature caps in ib_device.

Sagi.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC v2 08/10] IB/mlx5: Support IB_WR_REG_SIG_MR

2013-10-31 Thread Sagi Grimberg
This patch implements IB_WR_REG_SIG_MR posted by the user.

Baisically this WR involvs 3 WQEs in order to prepare and properly
register the signature layout:

1. post UMR WR to register the sig_mr in one of two possible ways:
* In case the user registered a single MR for data so the UMR data segment
  consists of:
  - single klm (data MR) passed by the user
  - BSF with signature attributes requested by the user.
* In case the user registered 2 MRs, one for data and one for protection,
  the UMR consists of:
  - strided block format which includes data and protection MRs and
their repetitive block format.
  - BSF with signature attributes requested by the user.

2. post SET_PSV in order to set the for the memory domain initial
   signature parameters passed by the user.

3. post SET_PSV in order to set the for the wire domain initial
   signature parameters passed by the user.

This patch also introduces some helper functions to set the BSF correctly
and determining the signature format selectors.

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/hw/mlx5/qp.c |  416 +++
 include/linux/mlx5/qp.h |   56 ++
 2 files changed, 472 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index ca78078..37e3715 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -1719,6 +1719,26 @@ static __be64 frwr_mkey_mask(void)
return cpu_to_be64(result);
 }
 
+static __be64 sig_mkey_mask(void)
+{
+   u64 result;
+
+   result = MLX5_MKEY_MASK_LEN |
+   MLX5_MKEY_MASK_PAGE_SIZE|
+   MLX5_MKEY_MASK_START_ADDR   |
+   MLX5_MKEY_MASK_EN_RINVAL|
+   MLX5_MKEY_MASK_KEY  |
+   MLX5_MKEY_MASK_LR   |
+   MLX5_MKEY_MASK_LW   |
+   MLX5_MKEY_MASK_RR   |
+   MLX5_MKEY_MASK_RW   |
+   MLX5_MKEY_MASK_SMALL_FENCE  |
+   MLX5_MKEY_MASK_FREE |
+   MLX5_MKEY_MASK_BSF_EN;
+
+   return cpu_to_be64(result);
+}
+
 static void set_frwr_umr_segment(struct mlx5_wqe_umr_ctrl_seg *umr,
 struct ib_send_wr *wr, int li)
 {
@@ -1901,6 +1921,339 @@ static int set_data_inl_seg(struct mlx5_ib_qp *qp, 
struct ib_send_wr *wr,
return 0;
 }
 
+static u16 prot_field_size(enum ib_signature_type type, u16 block_size)
+{
+   switch (type) {
+   case IB_SIG_TYPE_T10_DIF:
+   return MLX5_DIF_SIZE;
+   default:
+   return 0;
+   }
+}
+
+static u8 bs_selector(int block_size)
+{
+   switch (block_size) {
+   case 512:   return 0x1;
+   case 520:   return 0x2;
+   case 4096:  return 0x3;
+   case 4160:  return 0x4;
+   case 1073741824:return 0x5;
+   default:return 0;
+   }
+}
+
+static int format_selector(struct ib_sig_attrs *attr,
+  struct ib_sig_domain *domain,
+  int *selector)
+{
+
+#define FORMAT_DIF_NONE0
+#define FORMAT_DIF_CRC_INC 4
+#define FORMAT_DIF_CSUM_INC12
+#define FORMAT_DIF_CRC_NO_INC  13
+#define FORMAT_DIF_CSUM_NO_INC 14
+
+   switch (domain-sig.dif.type) {
+   case IB_T10DIF_NONE:
+   /* No DIF */
+   *selector = FORMAT_DIF_NONE;
+   break;
+   case IB_T10DIF_TYPE1: /* Fall through */
+   case IB_T10DIF_TYPE2:
+   switch (domain-sig.dif.bg_type) {
+   case IB_T10DIF_CRC:
+   *selector = FORMAT_DIF_CRC_INC;
+   break;
+   case IB_T10DIF_CSUM:
+   *selector = FORMAT_DIF_CSUM_INC;
+   break;
+   default:
+   return 1;
+   }
+   break;
+   case IB_T10DIF_TYPE3:
+   switch (domain-sig.dif.bg_type) {
+   case IB_T10DIF_CRC:
+   *selector = domain-sig.dif.type3_inc_reftag ?
+  FORMAT_DIF_CRC_INC :
+  FORMAT_DIF_CRC_NO_INC;
+   break;
+   case IB_T10DIF_CSUM:
+   *selector = domain-sig.dif.type3_inc_reftag ?
+  FORMAT_DIF_CSUM_INC :
+  FORMAT_DIF_CSUM_NO_INC;
+   break;
+   default:
+   return 1;
+   }
+   break;
+   default:
+   return 1;
+   }
+
+   return 0;
+}
+
+static int mlx5_set_bsf(struct ib_mr *sig_mr,
+   struct ib_sig_attrs *sig_attrs,
+   struct mlx5_bsf

[PATCH RFC v2 02/10] IB/core: Introduce Signature Verbs API

2013-10-31 Thread Sagi Grimberg
This commit Introduces the Verbs Interface for signature related
operations. A signature handover operation shall configure the
layouts of data and protection attributes both in memory and wire
domains.

Signature operations are:
- INSERT
  Generate and insert protection information when handing over
  data from input space to output space.
- vaildate and STRIP:
  Validate protection information and remove it when handing over
  data from input space to output space.
- validate and PASS:
  Validate protection information and pass it when handing over
  data from input space to output space.

Once the signature handover opration is done, the HCA will
offload data integrity generation/validation while performing
the actual data transfer.

Additions:
1. HCA signature capabilities in device attributes
Verbs provider supporting Signature handover operations shall
fill relevant fields in device attributes structure returned
by ib_query_device.

2. QP creation flag IB_QP_CREATE_SIGNATURE_EN
Creating QP that will carry signature handover operations
may require some special preperations from the verbs provider.
So we add QP creation flag IB_QP_CREATE_SIGNATURE_EN to declare
that the created QP may carry out signature handover operations.
Expose signature support to verbs layer (no support for now)

3. New send work request IB_WR_REG_SIG_MR
Signature handover work request. This WR will define the signature
handover properties of the memory/wire domains as well as the domains
layout. The purpose of this work request is to bind all the needed
information for the signature operation:
- data to be transferred:  wr-sg_list.
  * The raw data, pre-registered to a single MR (normally, before
signature, this MR would have been used directly for the data
transfer). the user will pass the data sge via sg_list exsisting
member.
- data protection guards: sig_handover.prot.
  * The data protection buffer, pre-registered to a single MR, which
contains the data integrity guards of the raw data blocks.
Note that it may not always exist, only in cases where the user is
interested in storing protection guards in memory.
- signature operation attributes: sig_handover.sig_attrs.
  * Tells the HCA how to validate/generate the protection information.

Once the work request is executed, the memory region which
will describe the signature transaction will be the sig_mr. The
application can now go ahead and send the sig_mr.rkey or use the
sig_mr.lkey for data transfer.

4. New Verb ib_check_sig_status
check_sig_status Verb shall check if any signature errors
are pending for a specific signature-enabled ib_mr.
This Verb is a lightwight check and is allowed to be taken
from interrupt context. Application must call this verb after
it is known that the actual data transfer has finished.

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/core/verbs.c |8 +++
 include/rdma/ib_verbs.h |  127 ++-
 2 files changed, 134 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 1d94a5c..5636d65 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1293,3 +1293,11 @@ int ib_dealloc_xrcd(struct ib_xrcd *xrcd)
return xrcd-device-dealloc_xrcd(xrcd);
 }
 EXPORT_SYMBOL(ib_dealloc_xrcd);
+
+int ib_check_sig_status(struct ib_mr *sig_mr,
+   struct ib_sig_err *sig_err)
+{
+   return sig_mr-device-check_sig_status ?
+   sig_mr-device-check_sig_status(sig_mr, sig_err) : -ENOSYS;
+}
+EXPORT_SYMBOL(ib_check_sig_status);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 53f065d..19b37eb 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -116,7 +116,19 @@ enum ib_device_cap_flags {
IB_DEVICE_MEM_MGT_EXTENSIONS= (121),
IB_DEVICE_BLOCK_MULTICAST_LOOPBACK = (122),
IB_DEVICE_MEM_WINDOW_TYPE_2A= (123),
-   IB_DEVICE_MEM_WINDOW_TYPE_2B= (124)
+   IB_DEVICE_MEM_WINDOW_TYPE_2B= (124),
+   IB_DEVICE_SIGNATURE_HANDOVER= (125),
+};
+
+enum ib_signature_prot_cap {
+   IB_PROT_T10DIF_TYPE_1 = 1,
+   IB_PROT_T10DIF_TYPE_2 = 1  1,
+   IB_PROT_T10DIF_TYPE_3 = 1  2,
+};
+
+enum ib_signature_guard_cap {
+   IB_GUARD_T10DIF_CRC = 1,
+   IB_GUARD_T10DIF_CSUM= 1  1,
 };
 
 enum ib_atomic_cap {
@@ -166,6 +178,8 @@ struct ib_device_attr {
unsigned intmax_fast_reg_page_list_len;
u16 max_pkeys;
u8  local_ca_ack_delay;
+   int sig_prot_cap;
+   int sig_guard_cap;
 };
 
 enum ib_mtu {
@@ -630,6 +644,7 @@ enum ib_qp_type {
 enum ib_qp_create_flags {
IB_QP_CREATE_IPOIB_UD_LSO

[PATCH RFC v2 06/10] IB/mlx5: remove MTT access mode from umr flags helper function

2013-10-31 Thread Sagi Grimberg
get_umr_flags helper function might be used for types
of access modes other than ACCESS_MODE_MTT, such as
ACCESS_MODE_KLM. so remove it from helper and caller
will add it's own access mode flag.

This commit does not add/change functionality.

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/hw/mlx5/qp.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index dc8d9fc..ca78078 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -1773,7 +1773,7 @@ static u8 get_umr_flags(int acc)
   (acc  IB_ACCESS_REMOTE_WRITE  ? MLX5_PERM_REMOTE_WRITE : 0) |
   (acc  IB_ACCESS_REMOTE_READ   ? MLX5_PERM_REMOTE_READ  : 0) |
   (acc  IB_ACCESS_LOCAL_WRITE   ? MLX5_PERM_LOCAL_WRITE  : 0) |
-   MLX5_PERM_LOCAL_READ | MLX5_PERM_UMR_EN | MLX5_ACCESS_MODE_MTT;
+   MLX5_PERM_LOCAL_READ | MLX5_PERM_UMR_EN;
 }
 
 static void set_mkey_segment(struct mlx5_mkey_seg *seg, struct ib_send_wr *wr,
@@ -1785,7 +1785,8 @@ static void set_mkey_segment(struct mlx5_mkey_seg *seg, 
struct ib_send_wr *wr,
return;
}
 
-   seg-flags = get_umr_flags(wr-wr.fast_reg.access_flags);
+   seg-flags = get_umr_flags(wr-wr.fast_reg.access_flags) |
+MLX5_ACCESS_MODE_MTT;
*writ = seg-flags  (MLX5_PERM_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE);
seg-qpn_mkey7_0 = cpu_to_be32((wr-wr.fast_reg.rkey  0xff) | 
0xff00);
seg-flags_pd = cpu_to_be32(MLX5_MKEY_REMOTE_INVAL);
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC v2 10/10] IB/mlx5: Publish support in signature feature

2013-10-31 Thread Sagi Grimberg
Currently support only T10-DIF types of signature
handover operations (typs 1|2|3).

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/hw/mlx5/main.c |9 +
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index f3c7111..3dd8219 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -274,6 +274,15 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
if (flags  MLX5_DEV_CAP_FLAG_XRC)
props-device_cap_flags |= IB_DEVICE_XRC;
props-device_cap_flags |= IB_DEVICE_MEM_MGT_EXTENSIONS;
+   if (flags  MLX5_DEV_CAP_FLAG_SIG_HAND_OVER) {
+   props-device_cap_flags |= IB_DEVICE_SIGNATURE_HANDOVER;
+   /* At this stage no support for signature handover */
+   props-sig_prot_cap = IB_PROT_T10DIF_TYPE_1 |
+ IB_PROT_T10DIF_TYPE_2 |
+ IB_PROT_T10DIF_TYPE_3;
+   props-sig_guard_cap = IB_GUARD_T10DIF_CRC |
+  IB_GUARD_T10DIF_CSUM;
+   }
 
props-vendor_id   = be32_to_cpup((__be32 *)(out_mad-data + 
36)) 
0xff;
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC v2 04/10] IB/mlx5: Initialize mlx5_ib_qp signature related

2013-10-31 Thread Sagi Grimberg
If user requested signature enable we Initialize
relevant mlx5_ib_qp members. we mark the qp as sig_enable
we initiatlize empty sig_err_list, and we increase qp size.

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/hw/mlx5/mlx5_ib.h |3 +++
 drivers/infiniband/hw/mlx5/qp.c  |5 +
 include/linux/mlx5/qp.h  |1 +
 3 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 45d7424..758f0e1 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -189,6 +189,9 @@ struct mlx5_ib_qp {
 
int create_type;
u32 pa_lkey;
+
+   /* Store signature errors */
+   boolsignature_en;
 };
 
 struct mlx5_ib_cq_buf {
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 045f8cd..c80122e 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -734,6 +734,11 @@ static int create_qp_common(struct mlx5_ib_dev *dev, 
struct ib_pd *pd,
spin_lock_init(qp-sq.lock);
spin_lock_init(qp-rq.lock);
 
+   if (init_attr-create_flags == IB_QP_CREATE_SIGNATURE_EN) {
+   init_attr-cap.max_send_wr *= MLX5_SIGNATURE_SQ_MULT;
+   qp-signature_en = true;
+   }
+
if (init_attr-sq_sig_type == IB_SIGNAL_ALL_WR)
qp-sq_signal_bits = MLX5_WQE_CTRL_CQ_UPDATE;
 
diff --git a/include/linux/mlx5/qp.h b/include/linux/mlx5/qp.h
index d9e3eac..174805c 100644
--- a/include/linux/mlx5/qp.h
+++ b/include/linux/mlx5/qp.h
@@ -37,6 +37,7 @@
 #include linux/mlx5/driver.h
 
 #define MLX5_INVALID_LKEY  0x100
+#define MLX5_SIGNATURE_SQ_MULT 3
 
 enum mlx5_qp_optpar {
MLX5_QP_OPTPAR_ALT_ADDR_PATH= 1  0,
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC v2 05/10] IB/mlx5: Break wqe handling to begin finish routines

2013-10-31 Thread Sagi Grimberg
As a preliminary step for signature feature which will
reuqire posting multiple (3) WQEs for a single WR, we
break post_send routine WQE indexing into begin and
finish routines.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/hw/mlx5/qp.c |   95 ---
 1 files changed, 59 insertions(+), 36 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index c80122e..dc8d9fc 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -1983,6 +1983,57 @@ static u8 get_fence(u8 fence, struct ib_send_wr *wr)
}
 }
 
+static int begin_wqe(struct mlx5_ib_qp *qp, void **seg,
+struct mlx5_wqe_ctrl_seg **ctrl,
+struct ib_send_wr *wr, int *idx,
+int *size, int nreq)
+{
+   int err = 0;
+   if (unlikely(mlx5_wq_overflow(qp-sq, nreq, qp-ibqp.send_cq))) {
+   err = -ENOMEM;
+   return err;
+   }
+
+   *idx = qp-sq.cur_post  (qp-sq.wqe_cnt - 1);
+   *seg = mlx5_get_send_wqe(qp, *idx);
+   *ctrl = *seg;
+   *(uint32_t *)(*seg + 8) = 0;
+   (*ctrl)-imm = send_ieth(wr);
+   (*ctrl)-fm_ce_se = qp-sq_signal_bits |
+   (wr-send_flags  IB_SEND_SIGNALED ?
+MLX5_WQE_CTRL_CQ_UPDATE : 0) |
+   (wr-send_flags  IB_SEND_SOLICITED ?
+MLX5_WQE_CTRL_SOLICITED : 0);
+
+   *seg += sizeof(**ctrl);
+   *size = sizeof(**ctrl) / 16;
+
+   return err;
+}
+
+static void finish_wqe(struct mlx5_ib_qp *qp,
+  struct mlx5_wqe_ctrl_seg *ctrl,
+  u8 size, unsigned idx, u64 wr_id,
+  int *nreq, u8 fence, u8 next_fence,
+  u32 mlx5_opcode)
+{
+   u8 opmod = 0;
+   ctrl-opmod_idx_opcode = cpu_to_be32(((u32)(qp-sq.cur_post)  8) |
+mlx5_opcode | ((u32)opmod  24));
+   ctrl-qpn_ds = cpu_to_be32(size | (qp-mqp.qpn  8));
+   ctrl-fm_ce_se |= fence;
+   qp-fm_cache = next_fence;
+   if (unlikely(qp-wq_sig))
+   ctrl-signature = wq_sig(ctrl);
+
+   qp-sq.wrid[idx] = wr_id;
+   qp-sq.w_list[idx].opcode = mlx5_opcode;
+   qp-sq.wqe_head[idx] = qp-sq.head + (*nreq)++;
+   qp-sq.cur_post += DIV_ROUND_UP(size * 16, MLX5_SEND_WQE_BB);
+   qp-sq.w_list[idx].next = qp-sq.cur_post;
+}
+
+
 int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
  struct ib_send_wr **bad_wr)
 {
@@ -1996,7 +2047,6 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct 
ib_send_wr *wr,
int uninitialized_var(size);
void *qend = qp-sq.qend;
unsigned long flags;
-   u32 mlx5_opcode;
unsigned idx;
int err = 0;
int inl = 0;
@@ -2005,7 +2055,6 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct 
ib_send_wr *wr,
int nreq;
int i;
u8 next_fence = 0;
-   u8 opmod = 0;
u8 fence;
 
spin_lock_irqsave(qp-sq.lock, flags);
@@ -2018,36 +2067,23 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct 
ib_send_wr *wr,
goto out;
}
 
-   if (unlikely(mlx5_wq_overflow(qp-sq, nreq, 
qp-ibqp.send_cq))) {
+   fence = qp-fm_cache;
+   num_sge = wr-num_sge;
+   if (unlikely(num_sge  qp-sq.max_gs)) {
mlx5_ib_warn(dev, \n);
err = -ENOMEM;
*bad_wr = wr;
goto out;
}
 
-   fence = qp-fm_cache;
-   num_sge = wr-num_sge;
-   if (unlikely(num_sge  qp-sq.max_gs)) {
+   err = begin_wqe(qp, seg, ctrl, wr, idx, size, nreq);
+   if (err) {
mlx5_ib_warn(dev, \n);
err = -ENOMEM;
*bad_wr = wr;
goto out;
}
 
-   idx = qp-sq.cur_post  (qp-sq.wqe_cnt - 1);
-   seg = mlx5_get_send_wqe(qp, idx);
-   ctrl = seg;
-   *(uint32_t *)(seg + 8) = 0;
-   ctrl-imm = send_ieth(wr);
-   ctrl-fm_ce_se = qp-sq_signal_bits |
-   (wr-send_flags  IB_SEND_SIGNALED ?
-MLX5_WQE_CTRL_CQ_UPDATE : 0) |
-   (wr-send_flags  IB_SEND_SOLICITED ?
-MLX5_WQE_CTRL_SOLICITED : 0);
-
-   seg += sizeof(*ctrl);
-   size = sizeof(*ctrl) / 16;
-
switch (ibqp-qp_type) {
case IB_QPT_XRC_INI:
xrc = seg;
@@ -2197,22 +2233,9 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct 
ib_send_wr *wr,
}
}
 
-   mlx5_opcode = mlx5_ib_opcode[wr-opcode];
-   ctrl-opmod_idx_opcode

[PATCH RFC v2 07/10] IB/mlx5: Keep mlx5 MRs in a radix tree under device

2013-10-31 Thread Sagi Grimberg
This will be useful when processing signature errors
on a specific key. The mlx5 driver will lookup the
matching mlx5 memory region structure and mark it as
dirty (contains signature errors).

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c |1 +
 drivers/net/ethernet/mellanox/mlx5/core/mr.c   |   20 
 include/linux/mlx5/driver.h|   12 
 3 files changed, 33 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index b47739b..5b7b3c7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -428,6 +428,7 @@ int mlx5_dev_init(struct mlx5_core_dev *dev, struct pci_dev 
*pdev)
mlx5_init_cq_table(dev);
mlx5_init_qp_table(dev);
mlx5_init_srq_table(dev);
+   mlx5_init_mr_table(dev);
 
return 0;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mr.c 
b/drivers/net/ethernet/mellanox/mlx5/core/mr.c
index 2ade604..f72e0b6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mr.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mr.c
@@ -36,9 +36,18 @@
 #include linux/mlx5/cmd.h
 #include mlx5_core.h
 
+void mlx5_init_mr_table(struct mlx5_core_dev *dev)
+{
+   struct mlx5_mr_table *table = dev-priv.mr_table;
+
+   rwlock_init(table-lock);
+   INIT_RADIX_TREE(table-tree, GFP_ATOMIC);
+}
+
 int mlx5_core_create_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr,
  struct mlx5_create_mkey_mbox_in *in, int inlen)
 {
+   struct mlx5_mr_table *table = dev-priv.mr_table;
struct mlx5_create_mkey_mbox_out out;
int err;
u8 key;
@@ -63,14 +72,21 @@ int mlx5_core_create_mkey(struct mlx5_core_dev *dev, struct 
mlx5_core_mr *mr,
mr-key = mlx5_idx_to_mkey(be32_to_cpu(out.mkey)  0xff) | key;
mlx5_core_dbg(dev, out 0x%x, key 0x%x, mkey 0x%x\n, 
be32_to_cpu(out.mkey), key, mr-key);
 
+   /* connect to MR tree */
+   write_lock_irq(table-lock);
+   err = radix_tree_insert(table-tree, mr-key  0xff00, mr);
+   write_unlock_irq(table-lock);
+
return err;
 }
 EXPORT_SYMBOL(mlx5_core_create_mkey);
 
 int mlx5_core_destroy_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr)
 {
+   struct mlx5_mr_table *table = dev-priv.mr_table;
struct mlx5_destroy_mkey_mbox_in in;
struct mlx5_destroy_mkey_mbox_out out;
+   unsigned long flags;
int err;
 
memset(in, 0, sizeof(in));
@@ -85,6 +101,10 @@ int mlx5_core_destroy_mkey(struct mlx5_core_dev *dev, 
struct mlx5_core_mr *mr)
if (out.hdr.status)
return mlx5_cmd_status_to_err(out.hdr);
 
+   write_lock_irqsave(table-lock, flags);
+   radix_tree_delete(table-tree, mr-key  0xff00);
+   write_unlock_irqrestore(table-lock, flags);
+
return err;
 }
 EXPORT_SYMBOL(mlx5_core_destroy_mkey);
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 7c33487..5fe0690 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -488,6 +488,13 @@ struct mlx5_srq_table {
struct radix_tree_root  tree;
 };
 
+struct mlx5_mr_table {
+   /* protect radix tree
+*/
+   rwlock_tlock;
+   struct radix_tree_root  tree;
+};
+
 struct mlx5_priv {
charname[MLX5_MAX_NAME_LEN];
struct mlx5_eq_tableeq_table;
@@ -516,6 +523,10 @@ struct mlx5_priv {
struct mlx5_cq_tablecq_table;
/* end: cq staff */
 
+   /* start: mr staff */
+   struct mlx5_mr_tablemr_table;
+   /* end: mr staff */
+
/* start: alloc staff */
struct mutexpgdir_mutex;
struct list_headpgdir_list;
@@ -691,6 +702,7 @@ int mlx5_core_query_srq(struct mlx5_core_dev *dev, struct 
mlx5_core_srq *srq,
struct mlx5_query_srq_mbox_out *out);
 int mlx5_core_arm_srq(struct mlx5_core_dev *dev, struct mlx5_core_srq *srq,
  u16 lwm, int is_srq);
+void mlx5_init_mr_table(struct mlx5_core_dev *dev);
 int mlx5_core_create_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr,
  struct mlx5_create_mkey_mbox_in *in, int inlen);
 int mlx5_core_destroy_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr);
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC v2 00/10] Introduce Signature feature

2013-10-31 Thread Sagi Grimberg
This patchset Introduces Verbs level support for signature handover
feature. Siganture is intended to implement end-to-end data integrity
on a transactional basis in a completely offloaded manner.

There are several end-to-end data integrity methods used today in various
applications and/or upper layer protocols such as T10-DIF defined by SCSI
specifications (SBC), CRC32, XOR8 and more. This patchset adds verbs
support only for T10-DIF. The proposed framework allows adding more
signature methods in the future.

In T10-DIF, when a series of 512-byte data blocks are transferred, each
block is followed by an 8-byte guard. The guard consists of CRC that
protects the integrity of the data in the block, and some other tags
that protects against mis-directed IOs.

Data can be protected when transferred over the wire, but can also be
protected in the memory of the sender/receiver. This allows true end-
to-end protection against bits flipping either over the wire, through
gateways, in memory, over PCI, etc.

While T10-DIF clearly defines that over the wire protection guards are
interleaved into the data stream (each 512-Byte block followed by 8-byte
guard), when in memory, the protection guards may reside in a buffer
separated from the data. Depending on the application, it is usually
easier to handle the data when it is contiguous. In this case the data
buffer will be of size 512xN and the protection buffer will be of size
8xN (where N is the number of blocks in the transaction).

There are 3 kinds of signature handover operation:
1. Take unprotected data (from wire or memory) and ADD protection
   guards.
2. Take protetected data (from wire or memory), validate the data
   integrity against the protection guards and STRIP the protection
   guards.
3. Take protected data (from wire or memory), validate the data
   integrity against the protection guards and PASS the data with
   the guards as-is.

This translates to defining to the HCA how/if data protection exists
in memory domain, and how/if data protection exists is wire domain.

The way that data integrity is performed is by using a new kind of
memory region: signature-enabled MR, and a new kind of work request:
REG_SIG_MR. The REG_SIG_MR WR operates on the signature-enabled MR,
and defines all the needed information for the signature handover
(data buffer, protection buffer if needed and signature attributes).
The result is an MR that can be used for data transfer as usual,
that will also add/validate/strip/pass protection guards.

When the data transfer is successfully completed, it does not mean
that there are no integrity errors. The user must afterwards check
the signature status of the handover operation using a new light-weight
verb.

This feature shall be used in storage upper layer protocols iSER/SRP
implementing end-to-end data integrity T10-DIF. Following this patchset,
we will soon submit krping patches which will demonstrate the usage of
these signature verbs.

Patchset summary:
- Intoduce verbs for create/destroy memory regions supporting signature.
- Introduce IB core signature verbs API.
- Implement mr create/destroy verbs in mlx5 driver.
- Preperation patches for signature support in mlx5 driver.
- Implement signature handover work request in mlx5 driver.
- Implement signature error collection and handling in mlx5 driver.

Changes from v1:
- IB/core: Reduced sizeof ib_send_wr by using wr-sg_list for data
   and dedicated ib_sge for protection guards buffer.
   Currently sig_handover extension does not increase sizeof ib_send_wr
- IB/core: Change enum to int for container variables.
- IB/mlx5: Validate wr-num_sge=1 for REG_SIG_MR work request.

Changes from v0:
- Commit messages: Added more detailed explanation for signature work request.
- IB/core: Remove indirect memory registration enablement from create_mr.
   Keep only signature enablement.
- IB/mlx5: Changed signature error processing via MR radix lookup.

Sagi Grimberg (10):
  IB/core: Introduce protected memory regions
  IB/core: Introduce Signature Verbs API
  IB/mlx5, mlx5_core: Support for create_mr and destroy_mr
  IB/mlx5: Initialize mlx5_ib_qp signature related
  IB/mlx5: Break wqe handling to begin  finish routines
  IB/mlx5: remove MTT access mode from umr flags helper function
  IB/mlx5: Keep mlx5 MRs in a radix tree under device
  IB/mlx5: Support IB_WR_REG_SIG_MR
  IB/mlx5: Collect signature error completion
  IB/mlx5: Publish support in signature feature

 drivers/infiniband/core/verbs.c|   47 +++
 drivers/infiniband/hw/mlx5/cq.c|   53 +++
 drivers/infiniband/hw/mlx5/main.c  |   12 +
 drivers/infiniband/hw/mlx5/mlx5_ib.h   |   14 +
 drivers/infiniband/hw/mlx5/mr.c|  138 +++
 drivers/infiniband/hw/mlx5/qp.c|  525 ++--
 drivers/net/ethernet/mellanox/mlx5/core/main.c |1 +
 drivers/net/ethernet/mellanox/mlx5/core/mr.c   |   84 
 include

[PATCH RFC v2 03/10] IB/mlx5, mlx5_core: Support for create_mr and destroy_mr

2013-10-31 Thread Sagi Grimberg
Support create_mr and destroy_mr verbs.
Creating ib_mr may be done for either ib_mr that will
register regular page lists like alloc_fast_reg_mr routine,
or indirect ib_mr's that can register other (pre-registered)
ib_mr's in an indirect manner.

In addition user may request signature enable, that will mean
that the created ib_mr may be attached with signature attributes
(BSF, PSVs).

Currently we only allow direct/indirect registration modes.

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/hw/mlx5/main.c|2 +
 drivers/infiniband/hw/mlx5/mlx5_ib.h |4 +
 drivers/infiniband/hw/mlx5/mr.c  |  109 ++
 drivers/net/ethernet/mellanox/mlx5/core/mr.c |   64 +++
 include/linux/mlx5/device.h  |   25 ++
 include/linux/mlx5/driver.h  |   19 +
 6 files changed, 223 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 3f831de..2e67a37 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1401,9 +1401,11 @@ static int init_one(struct pci_dev *pdev,
dev-ib_dev.get_dma_mr  = mlx5_ib_get_dma_mr;
dev-ib_dev.reg_user_mr = mlx5_ib_reg_user_mr;
dev-ib_dev.dereg_mr= mlx5_ib_dereg_mr;
+   dev-ib_dev.destroy_mr  = mlx5_ib_destroy_mr;
dev-ib_dev.attach_mcast= mlx5_ib_mcg_attach;
dev-ib_dev.detach_mcast= mlx5_ib_mcg_detach;
dev-ib_dev.process_mad = mlx5_ib_process_mad;
+   dev-ib_dev.create_mr   = mlx5_ib_create_mr;
dev-ib_dev.alloc_fast_reg_mr   = mlx5_ib_alloc_fast_reg_mr;
dev-ib_dev.alloc_fast_reg_page_list = mlx5_ib_alloc_fast_reg_page_list;
dev-ib_dev.free_fast_reg_page_list  = mlx5_ib_free_fast_reg_page_list;
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 836be91..45d7424 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -262,6 +262,7 @@ struct mlx5_ib_mr {
int npages;
struct completion   done;
enum ib_wc_status   status;
+   struct mlx5_core_sig_ctx*sig;
 };
 
 struct mlx5_ib_fast_reg_page_list {
@@ -489,6 +490,9 @@ struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 
start, u64 length,
  u64 virt_addr, int access_flags,
  struct ib_udata *udata);
 int mlx5_ib_dereg_mr(struct ib_mr *ibmr);
+int mlx5_ib_destroy_mr(struct ib_mr *ibmr);
+struct ib_mr *mlx5_ib_create_mr(struct ib_pd *pd,
+   struct ib_mr_init_attr *mr_init_attr);
 struct ib_mr *mlx5_ib_alloc_fast_reg_mr(struct ib_pd *pd,
int max_page_list_len);
 struct ib_fast_reg_page_list *mlx5_ib_alloc_fast_reg_page_list(struct 
ib_device *ibdev,
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index bd41df9..44f7e46 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -921,6 +921,115 @@ int mlx5_ib_dereg_mr(struct ib_mr *ibmr)
return 0;
 }
 
+struct ib_mr *mlx5_ib_create_mr(struct ib_pd *pd,
+   struct ib_mr_init_attr *mr_init_attr)
+{
+   struct mlx5_ib_dev *dev = to_mdev(pd-device);
+   struct mlx5_create_mkey_mbox_in *in;
+   struct mlx5_ib_mr *mr;
+   int access_mode, err;
+   int ndescs = roundup(mr_init_attr-max_reg_descriptors, 4);
+
+   mr = kzalloc(sizeof(*mr), GFP_KERNEL);
+   if (!mr)
+   return ERR_PTR(-ENOMEM);
+
+   in = kzalloc(sizeof(*in), GFP_KERNEL);
+   if (!in) {
+   err = -ENOMEM;
+   goto err_free;
+   }
+
+   in-seg.status = 1  6; /* free */
+   in-seg.xlt_oct_size = cpu_to_be32(ndescs);
+   in-seg.qpn_mkey7_0 = cpu_to_be32(0xff  8);
+   in-seg.flags_pd = cpu_to_be32(to_mpd(pd)-pdn);
+   access_mode = MLX5_ACCESS_MODE_MTT;
+
+   if (mr_init_attr-flags  IB_MR_SIGNATURE_EN) {
+   u32 psv_index[2];
+
+   in-seg.flags_pd = cpu_to_be32(be32_to_cpu(in-seg.flags_pd) |
+  MLX5_MKEY_BSF_EN);
+   in-seg.bsfs_octo_size = cpu_to_be32(MLX5_MKEY_BSF_OCTO_SIZE);
+   mr-sig = kzalloc(sizeof(*mr-sig), GFP_KERNEL);
+   if (!mr-sig) {
+   err = -ENOMEM;
+   goto err_free;
+   }
+
+   /* create mem  wire PSVs */
+   err = mlx5_core_create_psv(dev-mdev, to_mpd(pd)-pdn,
+  2, psv_index);
+   if (err)
+   goto err_free_sig;
+
+   access_mode = MLX5_ACCESS_MODE_KLM;
+   mr-sig-psv_memory.psv_idx = psv_index[0

Re: [PATCH RFC v2 03/10] IB/mlx5, mlx5_core: Support for create_mr and destroy_mr

2013-10-31 Thread Sagi Grimberg

On 10/31/2013 2:52 PM, Jack Wang wrote:

On 10/31/2013 01:24 PM, Sagi Grimberg wrote:

Support create_mr and destroy_mr verbs.
Creating ib_mr may be done for either ib_mr that will
register regular page lists like alloc_fast_reg_mr routine,
or indirect ib_mr's that can register other (pre-registered)
ib_mr's in an indirect manner.

In addition user may request signature enable, that will mean
that the created ib_mr may be attached with signature attributes
(BSF, PSVs).

Currently we only allow direct/indirect registration modes.

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
  drivers/infiniband/hw/mlx5/main.c|2 +
  drivers/infiniband/hw/mlx5/mlx5_ib.h |4 +
  drivers/infiniband/hw/mlx5/mr.c  |  109 ++
  drivers/net/ethernet/mellanox/mlx5/core/mr.c |   64 +++
  include/linux/mlx5/device.h  |   25 ++
  include/linux/mlx5/driver.h  |   19 +
  6 files changed, 223 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 3f831de..2e67a37 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1401,9 +1401,11 @@ static int init_one(struct pci_dev *pdev,
dev-ib_dev.get_dma_mr   = mlx5_ib_get_dma_mr;
dev-ib_dev.reg_user_mr  = mlx5_ib_reg_user_mr;
dev-ib_dev.dereg_mr = mlx5_ib_dereg_mr;
+   dev-ib_dev.destroy_mr   = mlx5_ib_destroy_mr;
dev-ib_dev.attach_mcast = mlx5_ib_mcg_attach;
dev-ib_dev.detach_mcast = mlx5_ib_mcg_detach;
dev-ib_dev.process_mad  = mlx5_ib_process_mad;
+   dev-ib_dev.create_mr= mlx5_ib_create_mr;
dev-ib_dev.alloc_fast_reg_mr= mlx5_ib_alloc_fast_reg_mr;
dev-ib_dev.alloc_fast_reg_page_list = mlx5_ib_alloc_fast_reg_page_list;
dev-ib_dev.free_fast_reg_page_list  = mlx5_ib_free_fast_reg_page_list;
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 836be91..45d7424 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -262,6 +262,7 @@ struct mlx5_ib_mr {
int npages;
struct completion   done;
enum ib_wc_status   status;
+   struct mlx5_core_sig_ctx*sig;
  };
  
  struct mlx5_ib_fast_reg_page_list {

@@ -489,6 +490,9 @@ struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 
start, u64 length,
  u64 virt_addr, int access_flags,
  struct ib_udata *udata);
  int mlx5_ib_dereg_mr(struct ib_mr *ibmr);
+int mlx5_ib_destroy_mr(struct ib_mr *ibmr);
+struct ib_mr *mlx5_ib_create_mr(struct ib_pd *pd,
+   struct ib_mr_init_attr *mr_init_attr);
  struct ib_mr *mlx5_ib_alloc_fast_reg_mr(struct ib_pd *pd,
int max_page_list_len);
  struct ib_fast_reg_page_list *mlx5_ib_alloc_fast_reg_page_list(struct 
ib_device *ibdev,
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index bd41df9..44f7e46 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -921,6 +921,115 @@ int mlx5_ib_dereg_mr(struct ib_mr *ibmr)
return 0;
  }
  
+struct ib_mr *mlx5_ib_create_mr(struct ib_pd *pd,

+   struct ib_mr_init_attr *mr_init_attr)
+{
+   struct mlx5_ib_dev *dev = to_mdev(pd-device);
+   struct mlx5_create_mkey_mbox_in *in;
+   struct mlx5_ib_mr *mr;
+   int access_mode, err;
+   int ndescs = roundup(mr_init_attr-max_reg_descriptors, 4);
+
+   mr = kzalloc(sizeof(*mr), GFP_KERNEL);
+   if (!mr)
+   return ERR_PTR(-ENOMEM);
+
+   in = kzalloc(sizeof(*in), GFP_KERNEL);
+   if (!in) {
+   err = -ENOMEM;
+   goto err_free;
+   }
+
+   in-seg.status = 1  6; /* free */
+   in-seg.xlt_oct_size = cpu_to_be32(ndescs);
+   in-seg.qpn_mkey7_0 = cpu_to_be32(0xff  8);
+   in-seg.flags_pd = cpu_to_be32(to_mpd(pd)-pdn);
+   access_mode = MLX5_ACCESS_MODE_MTT;
+
+   if (mr_init_attr-flags  IB_MR_SIGNATURE_EN) {
+   u32 psv_index[2];
+
+   in-seg.flags_pd = cpu_to_be32(be32_to_cpu(in-seg.flags_pd) |
+  MLX5_MKEY_BSF_EN);
+   in-seg.bsfs_octo_size = cpu_to_be32(MLX5_MKEY_BSF_OCTO_SIZE);
+   mr-sig = kzalloc(sizeof(*mr-sig), GFP_KERNEL);
+   if (!mr-sig) {
+   err = -ENOMEM;
+   goto err_free;
+   }
+
+   /* create mem  wire PSVs */
+   err = mlx5_core_create_psv(dev-mdev, to_mpd(pd)-pdn,
+  2, psv_index);
+   if (err)
+   goto err_free_sig

Re: [PATCH RFC v2 00/10] Introduce Signature feature

2013-10-31 Thread Sagi Grimberg

On 10/31/2013 2:55 PM, Jack Wang wrote:

Hi Sagi,

I wander what's the performance overhead with this DIF support?
And is there a roadmap for support SRP/ISER and target side for DIF?

Regards,
Jack


Well, all DIF operations are fully offloaded by the HCA so we don't expect
any performance degradation other than the obvious 8-bytes integrity 
overhead.

We have yet to take benchmarks on this and we definitely plan to do so.

Regarding our roadmap, we plan to support iSER target (LIO) and 
initiator first.

Some prior support for DIF needs to be added in target core level,
then transport implementation is pretty straight-forward (iSER/SRP).

So I aim for iSER DIF support (target+initiator) to make it into v3.14.

Hope this helps,

Sagi.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC v2 00/10] Introduce Signature feature

2013-11-03 Thread Sagi Grimberg

On 11/2/2013 12:06 AM, Bart Van Assche wrote:

On 31/10/2013 5:24, Sagi Grimberg wrote:

While T10-DIF clearly defines that over the wire protection guards are
interleaved into the data stream (each 512-Byte block followed by 8-byte
guard), when in memory, the protection guards may reside in a buffer
separated from the data. Depending on the application, it is usually
easier to handle the data when it is contiguous. In this case the data
buffer will be of size 512xN and the protection buffer will be of size
8xN (where N is the number of blocks in the transaction).


It might be worth mentioning here that in the Linux block layer the 
approach has been chosen where actual data an protection information 
are in separate buffers. See also the bi_integrity field in struct bio.


Bart.



Hey Bart, I was expecting your input on this
Thanks for the insightful comments!

The explanation here is an attempt to Introduce T10-DIF to the 
mailing-list as simple as possible, so I tried not to dive into SBC-3/SPC-4.
You are correct, the 8-byte protection guards will follow the protection 
interval which won't necessarily be 512 (only for DIF types 2,3).


Sagi.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC v2 00/10] Introduce Signature feature

2013-11-03 Thread Sagi Grimberg

On 11/2/2013 12:06 AM, Bart Van Assche wrote:

On 31/10/2013 5:24, Sagi Grimberg wrote:

While T10-DIF clearly defines that over the wire protection guards are
interleaved into the data stream (each 512-Byte block followed by 8-byte
guard), when in memory, the protection guards may reside in a buffer
separated from the data. Depending on the application, it is usually
easier to handle the data when it is contiguous. In this case the data
buffer will be of size 512xN and the protection buffer will be of size
8xN (where N is the number of blocks in the transaction).


It might be worth mentioning here that in the Linux block layer the 
approach has been chosen where actual data an protection information 
are in separate buffers. See also the bi_integrity field in struct bio.


Bart.



This is true, but signature verbs interface supports also data and 
protection interleaving in memory space.
A user wishes to do so will pass the same ib_sge both for data and 
protection. In fact this was a requirement we got from customers.


Sagi.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC v2 02/10] IB/core: Introduce Signature Verbs API

2013-11-03 Thread Sagi Grimberg

On 11/1/2013 5:13 PM, Bart Van Assche wrote:

On 31/10/2013 5:24, Sagi Grimberg wrote:

+/**
+ * struct ib_sig_domain - Parameters specific for T10-DIF
+ * domain.
+ * @sig_type: specific signauture type
+ * @sig: union of all signature domain attributes that may
+ *  be used to set domain layout.
+ *  @dif:
+ * @type: T10-DIF type (0|1|2|3)
+ * @bg_type: T10-DIF block guard type (CRC|CSUM)
+ * @block_size: block size in signature domain.
+ * @app_tag: if app_tag is owned be the user,
+ * HCA will take this value to be app_tag.
+ * @ref_tag: initial ref_tag of signature handover.
+ * @type3_inc_reftag: T10-DIF type 3 does not state
+ *about the reference tag, it is the user
+ *choice to increment it or not.
+ */
+struct ib_sig_domain {
+enum ib_signature_type sig_type;
+union {
+struct {
+enum ib_t10_dif_typetype;
+enum ib_t10_dif_bg_type bg_type;
+u16block_size;
+u16bg;
+u16app_tag;
+u32ref_tag;
+booltype3_inc_reftag;
+} dif;
+} sig;
+};


My understanding from SPC-4 is that in that when using protection 
information such information is inserted after every protection 
interval. A protection interval can be smaller than a logical block. 
Shouldn't the name block_size be changed into something like 
pi_interval to avoid confusion with the logical block size ?


Bart.



True, for DIF types 2,3 protection interval is not restricted to be 
logical block length and may be smaller.

I agree with pi_interval naming.

Note that pi_intervals smaller than 512 bytes are not supported at the 
moment.


Sagi.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC v2 01/10] IB/core: Introduce protected memory regions

2013-11-03 Thread Sagi Grimberg

On 11/1/2013 7:09 PM, Bart Van Assche wrote:

On 31/10/2013 5:24, Sagi Grimberg wrote:

+/**
+ * ib_mr_init_attr - Memory region init attributes passed to routine
+ *ib_create_mr.
+ * @max_reg_descriptors: max number of registration units that
+ *   may be used with UMR work requests.
+ * @flags: MR creation flags bit mask.
+ */
+struct ib_mr_init_attr {
+intmax_reg_descriptors;
+intflags;
+};


Is this the first patch that add the abbreviation UMR to a header 
file in include/rdma ? If so, I think it's a good idea not only to 
mention the abbreviation but also what UMR stands for.


Bart.



You are correct,
I prefer to remove this abbreviation UMR as it is not tightly related to 
signature.


The the max_reg_descriptors parameter is the equivalent to 
max_page_list_len of ib_alloc_fast_reg_mr().
The difference is that this memory region can also register indirect 
memory descriptors {key, addr, len} rather than u64 physical addresses.
For example signature enabled memory region may register 2 descriptors: 
data and protection.


I'll modify the explanation here in v3.

Sagi.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC v2 02/10] IB/core: Introduce Signature Verbs API

2013-11-03 Thread Sagi Grimberg

On 11/2/2013 12:23 AM, Bart Van Assche wrote:

On 31/10/2013 5:24, Sagi Grimberg wrote:

+ * @type3_inc_reftag: T10-DIF type 3 does not state
+ *about the reference tag, it is the user
+ *choice to increment it or not.


Can you explain this further ? Does this mean that the HCA can check 
whether the reference tags are increasing when receiving data for TYPE 
3 protection mode ? My understanding of SPC-4 is that the application 
is free to use the reference tag in any way when using TYPE 3 
protection and hence that the HCA must not check whether the reference 
tag is increasing for TYPE 3 protection. See e.g. 
sd_dif_type3_get_tag() in drivers/scsi/sd_dif.c.


Bart.


As I understand TYPE 3, the reference tag is free for the application to 
use - which may choose to inc it each PI or not. This option allows the 
application to inc ref_tag in type 3.
The DIF check is determined via check_mask. As I see it, correct use in 
case of DIF TYPE 3 is not to validate reference tag i.e. set REF_TAG 
bits in check_mask to zero.


Sagi.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC v2 02/10] IB/core: Introduce Signature Verbs API

2013-11-03 Thread Sagi Grimberg

On 11/1/2013 8:46 PM, Bart Van Assche wrote:

On 31/10/2013 5:24, Sagi Grimberg wrote:

+/**
+ * Signature T10-DIF block-guard types
+ */
+enum ib_t10_dif_bg_type {
+IB_T10DIF_CRC,
+IB_T10DIF_CSUM
+};


In SPC-4 paragraph 4.22.4 I found that the T10-PI guard is the CRC 
computed from the generator polynomial x^16 + x^15 + x^11 + x^9 + x^8 
+ x^7 + x^5 + x^4 + x^2 + x + 1. Could you tell me where I can find 
which guard computation method IB_T10DIF_CSUM corresponds to ?


Bart.


The IB_T10DIF_CSUM  computation method corresponds to IP checksum rules. 
this is aligned with SHOST_DIX_GUARD_IP guard type.


Sagi.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC v2 08/10] IB/mlx5: Support IB_WR_REG_SIG_MR

2013-11-03 Thread Sagi Grimberg

On 11/1/2013 5:05 PM, Bart Van Assche wrote:

On 31/10/2013 5:24, Sagi Grimberg wrote:

+static u8 bs_selector(int block_size)
+{
+switch (block_size) {
+case 512:return 0x1;
+case 520:return 0x2;
+case 4096:return 0x3;
+case 4160:return 0x4;
+case 1073741824:return 0x5;
+default:return 0;
+}
+}


Would it be possible to provide some more information about how the 
five supported block sizes have been chosen ?


Thanks,

Bart.



These block_sizes were chosen from our costumers who were interested in 
signature.

This is the current HCA support for the time being.

Sagi.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC v2 08/10] IB/mlx5: Support IB_WR_REG_SIG_MR

2013-11-03 Thread Sagi Grimberg

On 11/2/2013 11:59 PM, Bart Van Assche wrote:

On 2/11/2013 12:21, Or Gerlitz wrote:
On Fri, Nov 1, 2013 at 10:37 PM, Bart Van Assche bvanass...@acm.org 
wrote:

On 31/10/2013 5:24, Sagi Grimberg wrote:


This patch implements IB_WR_REG_SIG_MR posted by the user.

Baisically this WR involvs 3 WQEs in order to prepare and properly
register the signature layout:

1. post UMR WR to register the sig_mr in one of two possible ways:
  * In case the user registered a single MR for data so the UMR 
data

segment
consists of:
- single klm (data MR) passed by the user
- BSF with signature attributes requested by the user.
  * In case the user registered 2 MRs, one for data and one for
protection,
the UMR consists of:
- strided block format which includes data and protection 
MRs and

  their repetitive block format.
- BSF with signature attributes requested by the user.

2. post SET_PSV in order to set the for the memory domain initial
 signature parameters passed by the user.

3. post SET_PSV in order to set the for the wire domain initial
 signature parameters passed by the user.

This patch also introduces some helper functions to set the BSF 
correctly

and determining the signature format selectors.



Has it already been explained somewhere what the abbreviations KLM, 
BSF and

PSV stand for ?


Bart, these are all HW T10 related objects/concepts, we made an effort
to keep them contained within the mlx5 driver such that they don't
show up on the IB core layer. If this helps for the review, Sagi can
spare few words on each, sure.


Hello Or,

I would certainly appreciate it if these abbreviations could be 
clarified further. That would allow me to understand what has been 
explained in the above patch description :-)


Bart.




Hey Bart,

As Or said, these concepts are vendor specific and not exposed to IB 
core layer And their naming is also pure Mellanox.

This is also might change in future generation HCAs.

In general the sig_mr (signature enabled) is a memory region that can 
register other memory regions (hint: data MR and protection MR) and is 
attached to (mlx5) signature objects.


KLM: A tuple {key, addr, len} that is used for indirect registration.
BSF: this is the object that describes the wire and memory layouts. we 
call it a byte-stream format.
PSV: this is the signature variable that is computing the guards - used 
for generation and/or validation. exists for each domain.


So We constructed REG_SIG_MR operation as a 3-way operation:
- Online registration for sig_mr:  Register in an indirect manner for 
data and protection (if exists).
  If no protection exists in memory domain the sig_mr registers the 
data buffer (KLM).
  If protection exists in memory domain (DIX) the sig_mr registers data 
and protections buffers (KLMs)
  In the DIX case, order to transfer DIF every pi_interval the 
registration also defines the strided format of the execution 
(pi_interval of data followed by 8byte of protection in a repetitive 
manner).


- Define signature format of wire/memory domains (BSF object)
  tell the HW how to treat the signature layout in each domain 
(signature type, pi_interval etc...)


- Set the signature variables for each domain (memory, wire)
  Here we place the seeds that the HW starts signature computation (In 
the DIF case, Initial CRC, Initial ref_tag, initial app_tag).


Sagi.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC v2 02/10] IB/core: Introduce Signature Verbs API

2013-11-03 Thread Sagi Grimberg

On 11/3/2013 4:41 PM, Bart Van Assche wrote:

On 3/11/2013 4:15, Sagi Grimberg wrote:

On 11/1/2013 8:46 PM, Bart Van Assche wrote:

On 31/10/2013 5:24, Sagi Grimberg wrote:

+/**
+ * Signature T10-DIF block-guard types
+ */
+enum ib_t10_dif_bg_type {
+IB_T10DIF_CRC,
+IB_T10DIF_CSUM
+};


In SPC-4 paragraph 4.22.4 I found that the T10-PI guard is the CRC
computed from the generator polynomial x^16 + x^15 + x^11 + x^9 + x^8
+ x^7 + x^5 + x^4 + x^2 + x + 1. Could you tell me where I can find
which guard computation method IB_T10DIF_CSUM corresponds to ?

Bart.


The IB_T10DIF_CSUM  computation method corresponds to IP checksum rules.
this is aligned with SHOST_DIX_GUARD_IP guard type.


Since the declarations added in rdma/ib_verbs.h constitute an 
interface definition I think it would help if it would be made more 
clear what these two symbols stand for. How about mentioning the names 
of the standards these two guard computation methods come from ? An 
alternative is to add a comment like the one above 
scsi_host_guard_type in scsi/scsi_host.h which explains the two 
guard computation methods well:


/*
 * All DIX-capable initiators must support the T10-mandated CRC
 * checksum.  Controllers can optionally implement the IP checksum
 * scheme which has much lower impact on system performance.  Note
 * that the main rationale for the checksum is to match integrity
 * metadata with data.  Detecting bit errors are a job for ECC memory
 * and buses.
 */

Bart.



Agreed,

I'll comment on each type correspondence (T10-DIF CRC checksum and IP 
checksum).


Sagi.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC v2 00/10] Introduce Signature feature

2013-11-05 Thread Sagi Grimberg

On 11/4/2013 8:41 PM, Nicholas A. Bellinger wrote:

On Sat, 2013-11-02 at 14:57 -0700, Bart Van Assche wrote:

On 1/11/2013 18:36, Nicholas A. Bellinger wrote:

On Fri, 2013-11-01 at 08:03 -0700, Bart Van Assche wrote:

On 31/10/2013 5:24, Sagi Grimberg wrote:

In T10-DIF, when a series of 512-byte data blocks are transferred, each
block is followed by an 8-byte guard. The guard consists of CRC that
protects the integrity of the data in the block, and some other tags
that protects against mis-directed IOs.

Shouldn't that read logical block length divided by 2**(protection
interval exponent) instead of 512 ? From the SPC-4 FORMAT UNIT
section:

Why should the protection interval in FORMAT_UNIT be mentioned when it's
not supported by the hardware, nor by drivers/scsi/sd_dif.c itself..?

Hello Nick,

My understanding is that this patch series is not only intended for
initiator drivers but also for target drivers like ib_srpt and ib_isert.
As you know target drivers do not restrict the initiator operating
system to Linux. Although I do not know whether there are already
operating systems that support the protection interval exponent,

It's my understanding that Linux is still the only stack that supports
DIF, so AFAICT no one is actually supporting this.


  I think it is a good idea to stay as close as possible to the terminology
of the SPC-4 standard.


No, in this context it only adds pointless misdirection because 1) The
hardware in question doesn't support it, and 2) Linux itself doesn't
support it.


I think that Bart is suggesting renaming block_size as pi_interval in 
ib_sig_domain.
I tend to agree since even if support for that does not exist yet, it 
might be in the future.
I think it is not a misdirection because it does represent the 
protection information interval.



--nab



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 02/10] IB/core: Introduce Signature Verbs API

2013-11-07 Thread Sagi Grimberg
This commit Introduces the Verbs Interface for signature related
operations. A signature handover operation shall configure the
layouts of data and protection attributes both in memory and wire
domains.

Signature operations are:
- INSERT
  Generate and insert protection information when handing over
  data from input space to output space.
- vaildate and STRIP:
  Validate protection information and remove it when handing over
  data from input space to output space.
- validate and PASS:
  Validate protection information and pass it when handing over
  data from input space to output space.

Once the signature handover opration is done, the HCA will
offload data integrity generation/validation while performing
the actual data transfer.

Additions:
1. HCA signature capabilities in device attributes
Verbs provider supporting Signature handover operations shall
fill relevant fields in device attributes structure returned
by ib_query_device.

2. QP creation flag IB_QP_CREATE_SIGNATURE_EN
Creating QP that will carry signature handover operations
may require some special preperations from the verbs provider.
So we add QP creation flag IB_QP_CREATE_SIGNATURE_EN to declare
that the created QP may carry out signature handover operations.
Expose signature support to verbs layer (no support for now)

3. New send work request IB_WR_REG_SIG_MR
Signature handover work request. This WR will define the signature
handover properties of the memory/wire domains as well as the domains
layout. The purpose of this work request is to bind all the needed
information for the signature operation:
- data to be transferred:  wr-sg_list (ib_sge).
  * The raw data, pre-registered to a single MR (normally, before
signature, this MR would have been used directly for the data
transfer)
- data protection guards: sig_handover.prot (ib_sge).
  * The data protection buffer, pre-registered to a single MR, which
contains the data integrity guards of the raw data blocks.
Note that it may not always exist, only in cases where the user is
interested in storing protection guards in memory.
- signature operation attributes: sig_handover.sig_attrs.
  * Tells the HCA how to validate/generate the protection information.

Once the work request is executed, the memory region which
will describe the signature transaction will be the sig_mr. The
application can now go ahead and send the sig_mr.rkey or use the
sig_mr.lkey for data transfer.

4. New Verb ib_check_sig_status
check_sig_status Verb shall check if any signature errors
are pending for a specific signature-enabled ib_mr.
This Verb is a lightwight check and is allowed to be taken
from interrupt context. Application must call this verb after
it is known that the actual data transfer has finished.

issue: 333508
Change-Id: I0cce750a6b77cd1eae102c5982c8c31e46237af8
Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/core/verbs.c |8 +++
 include/rdma/ib_verbs.h |  132 ++-
 2 files changed, 139 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index ef47667..d3d2ce5 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1323,3 +1323,11 @@ int ib_destroy_flow(struct ib_flow *flow_id)
return err;
 }
 EXPORT_SYMBOL(ib_destroy_flow);
+
+int ib_check_sig_status(struct ib_mr *sig_mr,
+   struct ib_sig_err *sig_err)
+{
+   return sig_mr-device-check_sig_status ?
+   sig_mr-device-check_sig_status(sig_mr, sig_err) : -ENOSYS;
+}
+EXPORT_SYMBOL(ib_check_sig_status);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index af1bd1a..e71dae6 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -117,7 +117,19 @@ enum ib_device_cap_flags {
IB_DEVICE_BLOCK_MULTICAST_LOOPBACK = (122),
IB_DEVICE_MEM_WINDOW_TYPE_2A= (123),
IB_DEVICE_MEM_WINDOW_TYPE_2B= (124),
-   IB_DEVICE_MANAGED_FLOW_STEERING = (129)
+   IB_DEVICE_MANAGED_FLOW_STEERING = (129),
+   IB_DEVICE_SIGNATURE_HANDOVER= (130)
+};
+
+enum ib_signature_prot_cap {
+   IB_PROT_T10DIF_TYPE_1 = 1,
+   IB_PROT_T10DIF_TYPE_2 = 1  1,
+   IB_PROT_T10DIF_TYPE_3 = 1  2,
+};
+
+enum ib_signature_guard_cap {
+   IB_GUARD_T10DIF_CRC = 1,
+   IB_GUARD_T10DIF_CSUM= 1  1,
 };
 
 enum ib_atomic_cap {
@@ -167,6 +179,8 @@ struct ib_device_attr {
unsigned intmax_fast_reg_page_list_len;
u16 max_pkeys;
u8  local_ca_ack_delay;
+   int sig_prot_cap;
+   int sig_guard_cap;
 };
 
 enum ib_mtu {
@@ -471,6 +485,98 @@ struct ib_mr_init_attr {
u32 flags;
 };
 
+enum ib_signature_type

[PATCH v3 05/10] IB/mlx5: Break wqe handling to begin finish routines

2013-11-07 Thread Sagi Grimberg
As a preliminary step for signature feature which will
reuqire posting multiple (3) WQEs for a single WR, we
break post_send routine WQE indexing into begin and
finish routines.

This patch does not change any functionality.

issue: 333508
Change-Id: If373dff9a21ead58117137409e81143f94aa3fec
Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/hw/mlx5/qp.c |   97 --
 1 files changed, 61 insertions(+), 36 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index f61e93c..15df91b 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -1992,6 +1992,59 @@ static u8 get_fence(u8 fence, struct ib_send_wr *wr)
}
 }
 
+static int begin_wqe(struct mlx5_ib_qp *qp, void **seg,
+struct mlx5_wqe_ctrl_seg **ctrl,
+struct ib_send_wr *wr, int *idx,
+int *size, int nreq)
+{
+   int err = 0;
+
+   if (unlikely(mlx5_wq_overflow(qp-sq, nreq, qp-ibqp.send_cq))) {
+   err = -ENOMEM;
+   return err;
+   }
+
+   *idx = qp-sq.cur_post  (qp-sq.wqe_cnt - 1);
+   *seg = mlx5_get_send_wqe(qp, *idx);
+   *ctrl = *seg;
+   *(uint32_t *)(*seg + 8) = 0;
+   (*ctrl)-imm = send_ieth(wr);
+   (*ctrl)-fm_ce_se = qp-sq_signal_bits |
+   (wr-send_flags  IB_SEND_SIGNALED ?
+MLX5_WQE_CTRL_CQ_UPDATE : 0) |
+   (wr-send_flags  IB_SEND_SOLICITED ?
+MLX5_WQE_CTRL_SOLICITED : 0);
+
+   *seg += sizeof(**ctrl);
+   *size = sizeof(**ctrl) / 16;
+
+   return err;
+}
+
+static void finish_wqe(struct mlx5_ib_qp *qp,
+  struct mlx5_wqe_ctrl_seg *ctrl,
+  u8 size, unsigned idx, u64 wr_id,
+  int *nreq, u8 fence, u8 next_fence,
+  u32 mlx5_opcode)
+{
+   u8 opmod = 0;
+
+   ctrl-opmod_idx_opcode = cpu_to_be32(((u32)(qp-sq.cur_post)  8) |
+mlx5_opcode | ((u32)opmod  24));
+   ctrl-qpn_ds = cpu_to_be32(size | (qp-mqp.qpn  8));
+   ctrl-fm_ce_se |= fence;
+   qp-fm_cache = next_fence;
+   if (unlikely(qp-wq_sig))
+   ctrl-signature = wq_sig(ctrl);
+
+   qp-sq.wrid[idx] = wr_id;
+   qp-sq.w_list[idx].opcode = mlx5_opcode;
+   qp-sq.wqe_head[idx] = qp-sq.head + (*nreq)++;
+   qp-sq.cur_post += DIV_ROUND_UP(size * 16, MLX5_SEND_WQE_BB);
+   qp-sq.w_list[idx].next = qp-sq.cur_post;
+}
+
+
 int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
  struct ib_send_wr **bad_wr)
 {
@@ -2005,7 +2058,6 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct 
ib_send_wr *wr,
int uninitialized_var(size);
void *qend = qp-sq.qend;
unsigned long flags;
-   u32 mlx5_opcode;
unsigned idx;
int err = 0;
int inl = 0;
@@ -2014,7 +2066,6 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct 
ib_send_wr *wr,
int nreq;
int i;
u8 next_fence = 0;
-   u8 opmod = 0;
u8 fence;
 
spin_lock_irqsave(qp-sq.lock, flags);
@@ -2027,36 +2078,23 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct 
ib_send_wr *wr,
goto out;
}
 
-   if (unlikely(mlx5_wq_overflow(qp-sq, nreq, 
qp-ibqp.send_cq))) {
+   fence = qp-fm_cache;
+   num_sge = wr-num_sge;
+   if (unlikely(num_sge  qp-sq.max_gs)) {
mlx5_ib_warn(dev, \n);
err = -ENOMEM;
*bad_wr = wr;
goto out;
}
 
-   fence = qp-fm_cache;
-   num_sge = wr-num_sge;
-   if (unlikely(num_sge  qp-sq.max_gs)) {
+   err = begin_wqe(qp, seg, ctrl, wr, idx, size, nreq);
+   if (err) {
mlx5_ib_warn(dev, \n);
err = -ENOMEM;
*bad_wr = wr;
goto out;
}
 
-   idx = qp-sq.cur_post  (qp-sq.wqe_cnt - 1);
-   seg = mlx5_get_send_wqe(qp, idx);
-   ctrl = seg;
-   *(uint32_t *)(seg + 8) = 0;
-   ctrl-imm = send_ieth(wr);
-   ctrl-fm_ce_se = qp-sq_signal_bits |
-   (wr-send_flags  IB_SEND_SIGNALED ?
-MLX5_WQE_CTRL_CQ_UPDATE : 0) |
-   (wr-send_flags  IB_SEND_SOLICITED ?
-MLX5_WQE_CTRL_SOLICITED : 0);
-
-   seg += sizeof(*ctrl);
-   size = sizeof(*ctrl) / 16;
-
switch (ibqp-qp_type) {
case IB_QPT_XRC_INI:
xrc = seg;
@@ -2189,22 +2227,9 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct 
ib_send_wr *wr

[PATCH v3 10/10] IB/mlx5: Publish support in signature feature

2013-11-07 Thread Sagi Grimberg
Currently support only T10-DIF types of signature
handover operations (typs 1|2|3).

issue: 333508
Change-Id: I3ae2cce03a97074d56a52098b15c8bf74962aeed
Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/hw/mlx5/main.c |9 +
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 9dec71d..54736f5 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -274,6 +274,15 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
if (flags  MLX5_DEV_CAP_FLAG_XRC)
props-device_cap_flags |= IB_DEVICE_XRC;
props-device_cap_flags |= IB_DEVICE_MEM_MGT_EXTENSIONS;
+   if (flags  MLX5_DEV_CAP_FLAG_SIG_HAND_OVER) {
+   props-device_cap_flags |= IB_DEVICE_SIGNATURE_HANDOVER;
+   /* At this stage no support for signature handover */
+   props-sig_prot_cap = IB_PROT_T10DIF_TYPE_1 |
+ IB_PROT_T10DIF_TYPE_2 |
+ IB_PROT_T10DIF_TYPE_3;
+   props-sig_guard_cap = IB_GUARD_T10DIF_CRC |
+  IB_GUARD_T10DIF_CSUM;
+   }
 
props-vendor_id   = be32_to_cpup((__be32 *)(out_mad-data + 
36)) 
0xff;
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 09/10] IB/mlx5: Collect signature error completion

2013-11-07 Thread Sagi Grimberg
This commit takes care of the generated signature
error cqe generated by the HW (if happened). The
underlying mlx5 driver will handle signature error
completions and will mark the relevant memory region
as dirty.

Once the user will get the completion for the transaction
he must check for signature errors on signature memory region
using a new lightweight verb ib_check_sig_status and if such
exsists, he will get the signature error information.

In case the user will not check for signature error, i.e.
won't call ib_check_sig_status, it will not be allowed to
use the memory region for another signature operation
(REG_SIG_MR work request will fail).

issue: 333508
Change-Id: I002b12c6b685615b97c6fa29902ef06c70b11103
Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/hw/mlx5/cq.c  |   54 ++
 drivers/infiniband/hw/mlx5/main.c|1 +
 drivers/infiniband/hw/mlx5/mlx5_ib.h |7 
 drivers/infiniband/hw/mlx5/mr.c  |   31 +++
 drivers/infiniband/hw/mlx5/qp.c  |8 -
 include/linux/mlx5/cq.h  |1 +
 include/linux/mlx5/device.h  |   18 +++
 include/linux/mlx5/driver.h  |4 ++
 include/linux/mlx5/qp.h  |5 +++
 9 files changed, 127 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index 2834477..ac12dfe 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -351,6 +351,33 @@ static void handle_atomics(struct mlx5_ib_qp *qp, struct 
mlx5_cqe64 *cqe64,
qp-sq.last_poll = tail;
 }
 
+static void get_sig_err_item(struct mlx5_sig_err_cqe *cqe,
+struct ib_sig_err *item)
+{
+   u16 syndrome = be16_to_cpu(cqe-syndrome);
+
+   switch (syndrome) {
+   case 13:
+   item-err_type = IB_SIG_BAD_CRC;
+   break;
+   case 12:
+   item-err_type = IB_SIG_BAD_APPTAG;
+   break;
+   case 11:
+   item-err_type = IB_SIG_BAD_REFTAG;
+   break;
+   default:
+   break;
+   }
+
+   item-expected_guard = be32_to_cpu(cqe-expected_trans_sig)  16;
+   item-actual_guard = be32_to_cpu(cqe-actual_trans_sig)  16;
+   item-expected_logical_block = be32_to_cpu(cqe-expected_reftag);
+   item-actual_logical_block = be32_to_cpu(cqe-actual_reftag);
+   item-sig_err_offset = be64_to_cpu(cqe-err_offset);
+   item-key = be32_to_cpu(cqe-mkey);
+}
+
 static int mlx5_poll_one(struct mlx5_ib_cq *cq,
 struct mlx5_ib_qp **cur_qp,
 struct ib_wc *wc)
@@ -360,12 +387,16 @@ static int mlx5_poll_one(struct mlx5_ib_cq *cq,
struct mlx5_cqe64 *cqe64;
struct mlx5_core_qp *mqp;
struct mlx5_ib_wq *wq;
+   struct mlx5_sig_err_cqe *sig_err_cqe;
+   struct mlx5_core_mr *mmr;
+   struct mlx5_ib_mr *mr;
uint8_t opcode;
uint32_t qpn;
u16 wqe_ctr;
void *cqe;
int idx;
 
+repoll:
cqe = next_cqe_sw(cq);
if (!cqe)
return -EAGAIN;
@@ -449,6 +480,29 @@ static int mlx5_poll_one(struct mlx5_ib_cq *cq,
}
}
break;
+   case MLX5_CQE_SIG_ERR:
+   sig_err_cqe = (struct mlx5_sig_err_cqe *)cqe64;
+
+   read_lock(dev-mdev.priv.mr_table.lock);
+   mmr = __mlx5_mr_lookup(dev-mdev,
+  
mlx5_base_mkey(be32_to_cpu(sig_err_cqe-mkey)));
+   if (unlikely(!mmr)) {
+   read_unlock(dev-mdev.priv.mr_table.lock);
+   mlx5_ib_warn(dev, CQE@CQ %06x for unknown MR %6x\n,
+cq-mcq.cqn, 
be32_to_cpu(sig_err_cqe-mkey));
+   return -EINVAL;
+   }
+
+   mr = to_mibmr(mmr);
+   get_sig_err_item(sig_err_cqe, mr-sig-err_item);
+   mr-sig-sig_err_exists = true;
+   mr-sig-sigerr_count++;
+
+   mlx5_ib_dbg(dev, Got SIGERR on key: 0x%x\n,
+   mr-sig-err_item.key);
+
+   read_unlock(dev-mdev.priv.mr_table.lock);
+   goto repoll;
}
 
return 0;
diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 10263fa..9dec71d 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1414,6 +1414,7 @@ static int init_one(struct pci_dev *pdev,
dev-ib_dev.alloc_fast_reg_mr   = mlx5_ib_alloc_fast_reg_mr;
dev-ib_dev.alloc_fast_reg_page_list = mlx5_ib_alloc_fast_reg_page_list;
dev-ib_dev.free_fast_reg_page_list  = mlx5_ib_free_fast_reg_page_list;
+   dev-ib_dev.check_sig_status= mlx5_ib_check_sig_status;
 
if (mdev-caps.flags  MLX5_DEV_CAP_FLAG_XRC) {
dev-ib_dev.alloc_xrcd = mlx5_ib_alloc_xrcd

[PATCH v3 04/10] IB/mlx5: Initialize mlx5_ib_qp signature related

2013-11-07 Thread Sagi Grimberg
If user requested signature enable we Initialize
relevant mlx5_ib_qp members. we mark the qp as sig_enable
and we increase the effective SQ size, but still
limit the user max_send_wr to original size computed.

issue: 333508
Change-Id: I72c303f407fc8181139371d4c0a7e7e7550043e0
Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/hw/mlx5/mlx5_ib.h |3 +++
 drivers/infiniband/hw/mlx5/qp.c  |   16 
 include/linux/mlx5/qp.h  |1 +
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 43e0497..62b9e93 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -189,6 +189,9 @@ struct mlx5_ib_qp {
 
int create_type;
u32 pa_lkey;
+
+   /* Store signature errors */
+   boolsignature_en;
 };
 
 struct mlx5_ib_cq_buf {
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 7c6b4ba..f61e93c 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -263,6 +263,7 @@ static int calc_sq_size(struct mlx5_ib_dev *dev, struct 
ib_qp_init_attr *attr,
 {
int wqe_size;
int wq_size;
+   int eff_wq_size;
 
if (!attr-cap.max_send_wr)
return 0;
@@ -283,7 +284,14 @@ static int calc_sq_size(struct mlx5_ib_dev *dev, struct 
ib_qp_init_attr *attr,
attr-cap.max_inline_data = qp-max_inline_data;
 
wq_size = roundup_pow_of_two(attr-cap.max_send_wr * wqe_size);
-   qp-sq.wqe_cnt = wq_size / MLX5_SEND_WQE_BB;
+   if (attr-create_flags  IB_QP_CREATE_SIGNATURE_EN) {
+   eff_wq_size = roundup_pow_of_two(attr-cap.max_send_wr * 
wqe_size *
+MLX5_SIGNATURE_SQ_MULT);
+   qp-signature_en = true;
+   } else {
+   eff_wq_size = wq_size;
+   }
+   qp-sq.wqe_cnt = eff_wq_size / MLX5_SEND_WQE_BB;
if (qp-sq.wqe_cnt  dev-mdev.caps.max_wqes) {
mlx5_ib_dbg(dev, wqe count(%d) exceeds limits(%d)\n,
qp-sq.wqe_cnt, dev-mdev.caps.max_wqes);
@@ -291,10 +299,10 @@ static int calc_sq_size(struct mlx5_ib_dev *dev, struct 
ib_qp_init_attr *attr,
}
qp-sq.wqe_shift = ilog2(MLX5_SEND_WQE_BB);
qp-sq.max_gs = attr-cap.max_send_sge;
-   qp-sq.max_post = wq_size / wqe_size;
-   attr-cap.max_send_wr = qp-sq.max_post;
+   qp-sq.max_post = eff_wq_size / wqe_size;
+   attr-cap.max_send_wr = wq_size / wqe_size;
 
-   return wq_size;
+   return eff_wq_size;
 }
 
 static int set_user_buf_size(struct mlx5_ib_dev *dev,
diff --git a/include/linux/mlx5/qp.h b/include/linux/mlx5/qp.h
index d9e3eac..174805c 100644
--- a/include/linux/mlx5/qp.h
+++ b/include/linux/mlx5/qp.h
@@ -37,6 +37,7 @@
 #include linux/mlx5/driver.h
 
 #define MLX5_INVALID_LKEY  0x100
+#define MLX5_SIGNATURE_SQ_MULT 3
 
 enum mlx5_qp_optpar {
MLX5_QP_OPTPAR_ALT_ADDR_PATH= 1  0,
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 08/10] IB/mlx5: Support IB_WR_REG_SIG_MR

2013-11-07 Thread Sagi Grimberg
This patch implements IB_WR_REG_SIG_MR posted by the user.

Baisically this compound WR involvs 3 WQEs in order to prepare
and properly register the signature layout:

1. post UMR WR to register the sig_mr in one of two possible ways:
* In case the user registered a single MR for data so the UMR data segment
  consists of:
  - single klm (data MR) passed by the user
  - BSF with signature attributes requested by the user.
* In case the user registered 2 MRs, one for data and one for protection,
  the UMR consists of:
  - strided block format which includes data and protection MRs and
their repetitive block format.
  - BSF with signature attributes requested by the user.

2. post SET_PSV in order to set the memory domain initial
   signature parameters passed by the user.
   SET_PSV is not signaled and solicited CQE.

3. post SET_PSV in order to set the wire domain initial
   signature parameters passed by the user.
   SET_PSV is not signaled and solicited CQE.

* After this compound WR we place a small fence for next WR to come.

This patch also introduces some helper functions to set the BSF correctly
and determining the signature format selectors.

issue: 333508
Change-Id: I66843ed14cb41275071b57fbba92018fe19bf4f5
Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/hw/mlx5/qp.c |  417 +++
 include/linux/mlx5/device.h |4 +
 include/linux/mlx5/qp.h |   61 ++
 3 files changed, 482 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 43f120a..688c68a 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -1722,6 +1722,26 @@ static __be64 frwr_mkey_mask(void)
return cpu_to_be64(result);
 }
 
+static __be64 sig_mkey_mask(void)
+{
+   u64 result;
+
+   result = MLX5_MKEY_MASK_LEN |
+   MLX5_MKEY_MASK_PAGE_SIZE|
+   MLX5_MKEY_MASK_START_ADDR   |
+   MLX5_MKEY_MASK_EN_RINVAL|
+   MLX5_MKEY_MASK_KEY  |
+   MLX5_MKEY_MASK_LR   |
+   MLX5_MKEY_MASK_LW   |
+   MLX5_MKEY_MASK_RR   |
+   MLX5_MKEY_MASK_RW   |
+   MLX5_MKEY_MASK_SMALL_FENCE  |
+   MLX5_MKEY_MASK_FREE |
+   MLX5_MKEY_MASK_BSF_EN;
+
+   return cpu_to_be64(result);
+}
+
 static void set_frwr_umr_segment(struct mlx5_wqe_umr_ctrl_seg *umr,
 struct ib_send_wr *wr, int li)
 {
@@ -1906,6 +1926,334 @@ static int set_data_inl_seg(struct mlx5_ib_qp *qp, 
struct ib_send_wr *wr,
return 0;
 }
 
+static u16 prot_field_size(enum ib_signature_type type)
+{
+   switch (type) {
+   case IB_SIG_TYPE_T10_DIF:
+   return MLX5_DIF_SIZE;
+   default:
+   return 0;
+   }
+}
+
+static u8 bs_selector(int block_size)
+{
+   switch (block_size) {
+   case 512:   return 0x1;
+   case 520:   return 0x2;
+   case 4096:  return 0x3;
+   case 4160:  return 0x4;
+   case 1073741824:return 0x5;
+   default:return 0;
+   }
+}
+
+static int format_selector(struct ib_sig_attrs *attr,
+  struct ib_sig_domain *domain,
+  int *selector)
+{
+
+#define FORMAT_DIF_NONE0
+#define FORMAT_DIF_CRC_INC 4
+#define FORMAT_DIF_CSUM_INC12
+#define FORMAT_DIF_CRC_NO_INC  13
+#define FORMAT_DIF_CSUM_NO_INC 14
+
+   switch (domain-sig.dif.type) {
+   case IB_T10DIF_NONE:
+   /* No DIF */
+   *selector = FORMAT_DIF_NONE;
+   break;
+   case IB_T10DIF_TYPE1: /* Fall through */
+   case IB_T10DIF_TYPE2:
+   switch (domain-sig.dif.bg_type) {
+   case IB_T10DIF_CRC:
+   *selector = FORMAT_DIF_CRC_INC;
+   break;
+   case IB_T10DIF_CSUM:
+   *selector = FORMAT_DIF_CSUM_INC;
+   break;
+   default:
+   return 1;
+   }
+   break;
+   case IB_T10DIF_TYPE3:
+   switch (domain-sig.dif.bg_type) {
+   case IB_T10DIF_CRC:
+   *selector = domain-sig.dif.type3_inc_reftag ?
+  FORMAT_DIF_CRC_INC :
+  FORMAT_DIF_CRC_NO_INC;
+   break;
+   case IB_T10DIF_CSUM:
+   *selector = domain-sig.dif.type3_inc_reftag ?
+  FORMAT_DIF_CSUM_INC :
+  FORMAT_DIF_CSUM_NO_INC;
+   break;
+   default:
+   return 1

[PATCH v3 06/10] IB/mlx5: remove MTT access mode from umr flags helper function

2013-11-07 Thread Sagi Grimberg
get_umr_flags helper function might be used for types
of access modes other than ACCESS_MODE_MTT, such as
ACCESS_MODE_KLM. so remove it from helper and caller
will add it's own access mode flag.

This commit does not add/change functionality.

issue: 333508
Change-Id: If4aca628d1ca88be93a2161e4a158363dcaa134b
Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/hw/mlx5/qp.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 15df91b..43f120a 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -1777,7 +1777,7 @@ static u8 get_umr_flags(int acc)
   (acc  IB_ACCESS_REMOTE_WRITE  ? MLX5_PERM_REMOTE_WRITE : 0) |
   (acc  IB_ACCESS_REMOTE_READ   ? MLX5_PERM_REMOTE_READ  : 0) |
   (acc  IB_ACCESS_LOCAL_WRITE   ? MLX5_PERM_LOCAL_WRITE  : 0) |
-   MLX5_PERM_LOCAL_READ | MLX5_PERM_UMR_EN | MLX5_ACCESS_MODE_MTT;
+   MLX5_PERM_LOCAL_READ | MLX5_PERM_UMR_EN;
 }
 
 static void set_mkey_segment(struct mlx5_mkey_seg *seg, struct ib_send_wr *wr,
@@ -1789,7 +1789,8 @@ static void set_mkey_segment(struct mlx5_mkey_seg *seg, 
struct ib_send_wr *wr,
return;
}
 
-   seg-flags = get_umr_flags(wr-wr.fast_reg.access_flags);
+   seg-flags = get_umr_flags(wr-wr.fast_reg.access_flags) |
+MLX5_ACCESS_MODE_MTT;
*writ = seg-flags  (MLX5_PERM_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE);
seg-qpn_mkey7_0 = cpu_to_be32((wr-wr.fast_reg.rkey  0xff) | 
0xff00);
seg-flags_pd = cpu_to_be32(MLX5_MKEY_REMOTE_INVAL);
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 07/10] IB/mlx5: Keep mlx5 MRs in a radix tree under device

2013-11-07 Thread Sagi Grimberg
This will be useful when processing signature errors
on a specific key. The mlx5 driver will lookup the
matching mlx5 memory region structure and mark it as
dirty (contains signature errors).

issue: 333508
Change-Id: I04dbb746012b050d13161d134d2d05c8c333189a
Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c |1 +
 drivers/net/ethernet/mellanox/mlx5/core/mr.c   |   24 
 include/linux/mlx5/driver.h|   18 ++
 3 files changed, 43 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 40a9f5e..6e77c8e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -446,6 +446,7 @@ int mlx5_dev_init(struct mlx5_core_dev *dev, struct pci_dev 
*pdev)
mlx5_init_cq_table(dev);
mlx5_init_qp_table(dev);
mlx5_init_srq_table(dev);
+   mlx5_init_mr_table(dev);
 
return 0;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mr.c 
b/drivers/net/ethernet/mellanox/mlx5/core/mr.c
index bb746bb..4cc9276 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mr.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mr.c
@@ -36,11 +36,24 @@
 #include linux/mlx5/cmd.h
 #include mlx5_core.h
 
+void mlx5_init_mr_table(struct mlx5_core_dev *dev)
+{
+   struct mlx5_mr_table *table = dev-priv.mr_table;
+
+   rwlock_init(table-lock);
+   INIT_RADIX_TREE(table-tree, GFP_ATOMIC);
+}
+
+void mlx5_cleanup_mr_table(struct mlx5_core_dev *dev)
+{
+}
+
 int mlx5_core_create_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr,
  struct mlx5_create_mkey_mbox_in *in, int inlen,
  mlx5_cmd_cbk_t callback, void *context,
  struct mlx5_create_mkey_mbox_out *out)
 {
+   struct mlx5_mr_table *table = dev-priv.mr_table;
struct mlx5_create_mkey_mbox_out lout;
int err;
u8 key;
@@ -73,14 +86,21 @@ int mlx5_core_create_mkey(struct mlx5_core_dev *dev, struct 
mlx5_core_mr *mr,
mlx5_core_dbg(dev, out 0x%x, key 0x%x, mkey 0x%x\n,
  be32_to_cpu(lout.mkey), key, mr-key);
 
+   /* connect to MR tree */
+   write_lock_irq(table-lock);
+   err = radix_tree_insert(table-tree, mlx5_base_mkey(mr-key), mr);
+   write_unlock_irq(table-lock);
+
return err;
 }
 EXPORT_SYMBOL(mlx5_core_create_mkey);
 
 int mlx5_core_destroy_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr)
 {
+   struct mlx5_mr_table *table = dev-priv.mr_table;
struct mlx5_destroy_mkey_mbox_in in;
struct mlx5_destroy_mkey_mbox_out out;
+   unsigned long flags;
int err;
 
memset(in, 0, sizeof(in));
@@ -95,6 +115,10 @@ int mlx5_core_destroy_mkey(struct mlx5_core_dev *dev, 
struct mlx5_core_mr *mr)
if (out.hdr.status)
return mlx5_cmd_status_to_err(out.hdr);
 
+   write_lock_irqsave(table-lock, flags);
+   radix_tree_delete(table-tree, mlx5_base_mkey(mr-key));
+   write_unlock_irqrestore(table-lock, flags);
+
return err;
 }
 EXPORT_SYMBOL(mlx5_core_destroy_mkey);
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 58f5b95..1d97762 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -488,6 +488,13 @@ struct mlx5_srq_table {
struct radix_tree_root  tree;
 };
 
+struct mlx5_mr_table {
+   /* protect radix tree
+*/
+   rwlock_tlock;
+   struct radix_tree_root  tree;
+};
+
 struct mlx5_priv {
charname[MLX5_MAX_NAME_LEN];
struct mlx5_eq_tableeq_table;
@@ -517,6 +524,10 @@ struct mlx5_priv {
struct mlx5_cq_tablecq_table;
/* end: cq staff */
 
+   /* start: mr staff */
+   struct mlx5_mr_tablemr_table;
+   /* end: mr staff */
+
/* start: alloc staff */
struct mutexpgdir_mutex;
struct list_headpgdir_list;
@@ -664,6 +675,11 @@ static inline void mlx5_vfree(const void *addr)
kfree(addr);
 }
 
+static inline u32 mlx5_base_mkey(const u32 key)
+{
+   return key  0xff00u;
+}
+
 int mlx5_dev_init(struct mlx5_core_dev *dev, struct pci_dev *pdev);
 void mlx5_dev_cleanup(struct mlx5_core_dev *dev);
 int mlx5_cmd_init(struct mlx5_core_dev *dev);
@@ -698,6 +714,8 @@ int mlx5_core_query_srq(struct mlx5_core_dev *dev, struct 
mlx5_core_srq *srq,
struct mlx5_query_srq_mbox_out *out);
 int mlx5_core_arm_srq(struct mlx5_core_dev *dev, struct mlx5_core_srq *srq,
  u16 lwm, int is_srq);
+void mlx5_init_mr_table(struct mlx5_core_dev *dev);
+void mlx5_cleanup_mr_table(struct mlx5_core_dev *dev);
 int mlx5_core_create_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr,
  struct

[PATCH v3 00/10] Introduce Signature feature

2013-11-07 Thread Sagi Grimberg
This patchset Introduces Verbs level support for signature handover
feature. Siganture is intended to implement end-to-end data integrity
on a transactional basis in a completely offloaded manner.

There are several end-to-end data integrity methods used today in various
applications and/or upper layer protocols such as T10-DIF defined by SCSI
specifications (SBC), CRC32, XOR8 and more. This patchset adds verbs
support only for T10-DIF. The proposed framework allows adding more
signature methods in the future.

In T10-DIF, when a series of 512-byte data blocks are transferred, each
block is followed by an 8-byte guard (note that other protection intervals
may be used other then 512-bytes). The guard consists of CRC that protects
the integrity of the data in the block, and tag that protects against
mis-directed IOs and a free tag for application use.

Data can be protected when transferred over the wire, but can also be
protected in the memory of the sender/receiver. This allows true end-
to-end protection against bits flipping either over the wire, through
gateways, in memory, over PCI, etc.

While T10-DIF clearly defines that over the wire protection guards are
interleaved into the data stream (each 512-Byte block followed by 8-byte
guard), when in memory, the protection guards may reside in a buffer
separated from the data. Depending on the application, it is usually
easier to handle the data when it is contiguous. In this case the data
buffer will be of size 512xN and the protection buffer will be of size
8xN (where N is the number of blocks in the transaction).

There are 3 kinds of signature handover operation:
1. Take unprotected data (from wire or memory) and ADD protection
   guards.
2. Take protetected data (from wire or memory), validate the data
   integrity against the protection guards and STRIP the protection
   guards.
3. Take protected data (from wire or memory), validate the data
   integrity against the protection guards and PASS the data with
   the guards as-is.

This translates to defining to the HCA how/if data protection exists
in memory domain, and how/if data protection exists is wire domain.

The way that data integrity is performed is by using a new kind of
memory region: signature-enabled MR, and a new kind of work request:
REG_SIG_MR. The REG_SIG_MR WR operates on the signature-enabled MR,
and defines all the needed information for the signature handover
(data buffer, protection buffer if needed and signature attributes).
The result is an MR that can be used for data transfer as usual,
that will also add/validate/strip/pass protection guards.

When the data transfer is successfully completed, it does not mean
that there are no integrity errors. The user must afterwards check
the signature status of the handover operation using a new light-weight
verb.

This feature shall be used in storage upper layer protocols iSER/SRP
implementing end-to-end data integrity T10-DIF. Following this patchset,
we will soon submit krping patches which will demonstrate the usage of
these signature verbs.

Patchset summary:
- Intoduce verbs for create/destroy memory regions supporting signature.
- Introduce IB core signature verbs API.
- Implement mr create/destroy verbs in mlx5 driver.
- Preperation patches for signature support in mlx5 driver.
- Implement signature handover work request in mlx5 driver.
- Implement signature error collection and handling in mlx5 driver.

Changes from v2 (mostly CR comments):
- IB/core: Added comment on IB_T10DIF_CRC/CSUM declarations.
- IB/core: Renamed block_size as pi_interval in ib_sig_attrs.
- IB/core: Took t10_dif domain out of sig union (ib_sig_domain).
- IB/mlx5: Fixed memory leak in create_mr
- IB/mlx5: Remove redundant assignment in WQE initialization.
- IB/mlx5: Fixed possible NULL dereference in check_sig_status
   and set_sig_wr.
- IB/mlx5: Added helper function to convert mkey to base key.
- IB/mlx5: Reduced Fencing in compund REG_SIG_MR WR.
- Resolved checkpatch warnings. 

Changes from v1:
- IB/core: Reduced sizeof ib_send_wr by using wr-sg_list for data
   and dedicated ib_sge for protection guards buffer.
   Currently sig_handover extension does not increase sizeof ib_send_wr
- IB/core: Change enum to int for container variables.
- IB/mlx5: Validate wr-num_sge=1 for REG_SIG_MR work request.

Changes from v0:
- Commit messages: Added more detailed explanation for signature work request.
- IB/core: Remove indirect memory registration enablement from create_mr.
   Keep only signature enablement.
- IB/mlx5: Changed signature error processing via MR radix lookup.

Sagi Grimberg (10):
  IB/core: Introduce protected memory regions
  IB/core: Introduce Signature Verbs API
  IB/mlx5, mlx5_core: Support for create_mr and destroy_mr
  IB/mlx5: Initialize mlx5_ib_qp signature related
  IB/mlx5: Break wqe handling to begin  finish routines
  IB/mlx5: remove MTT access mode from umr flags helper function
  IB/mlx5: Keep mlx5 MRs

[PATCH v3 01/10] IB/core: Introduce protected memory regions

2013-11-07 Thread Sagi Grimberg
This commit introduces verbs for creating/destoying memory
regions which will allow new types of memory key operations such
as protected memory registration.

Indirect memory registration is registering several (one
of more) pre-registered memory regions in a specific layout.
The Indirect region may potentialy describe several regions
and some repitition format between them.

Protected Memory registration is registering a memory region
with various data integrity attributes which will describe protection
schemes that will be handled by the HCA in an offloaded manner.
These memory regions will be applicable for a new REG_SIG_MR
work request introduced later in this patchset.

In the future these routines may replace or implement current memory
regions creation routines existing today:
- ib_reg_user_mr
- ib_alloc_fast_reg_mr
- ib_get_dma_mr
- ib_dereg_mr

issue: 333508
Change-Id: Id3d221a002af9a95716a44d0163ca0de1c6dbbb8
Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/core/verbs.c |   39 +++
 include/rdma/ib_verbs.h |   38 ++
 2 files changed, 77 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index a321df2..ef47667 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1055,6 +1055,45 @@ int ib_dereg_mr(struct ib_mr *mr)
 }
 EXPORT_SYMBOL(ib_dereg_mr);
 
+struct ib_mr *ib_create_mr(struct ib_pd *pd,
+  struct ib_mr_init_attr *mr_init_attr)
+{
+   struct ib_mr *mr;
+
+   if (!pd-device-create_mr)
+   return ERR_PTR(-ENOSYS);
+
+   mr = pd-device-create_mr(pd, mr_init_attr);
+
+   if (!IS_ERR(mr)) {
+   mr-device  = pd-device;
+   mr-pd  = pd;
+   mr-uobject = NULL;
+   atomic_inc(pd-usecnt);
+   atomic_set(mr-usecnt, 0);
+   }
+
+   return mr;
+}
+EXPORT_SYMBOL(ib_create_mr);
+
+int ib_destroy_mr(struct ib_mr *mr)
+{
+   struct ib_pd *pd;
+   int ret;
+
+   if (atomic_read(mr-usecnt))
+   return -EBUSY;
+
+   pd = mr-pd;
+   ret = mr-device-destroy_mr(mr);
+   if (!ret)
+   atomic_dec(pd-usecnt);
+
+   return ret;
+}
+EXPORT_SYMBOL(ib_destroy_mr);
+
 struct ib_mr *ib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len)
 {
struct ib_mr *mr;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index e393171..af1bd1a 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -455,6 +455,22 @@ int ib_rate_to_mult(enum ib_rate rate) __attribute_const__;
  */
 int ib_rate_to_mbps(enum ib_rate rate) __attribute_const__;
 
+enum ib_mr_create_flags {
+   IB_MR_SIGNATURE_EN = 1,
+};
+
+/**
+ * ib_mr_init_attr - Memory region init attributes passed to routine
+ * ib_create_mr.
+ * @max_reg_descriptors: max number of registration descriptors that
+ * may be used with registration work requests.
+ * @flags: MR creation flags bit mask.
+ */
+struct ib_mr_init_attr {
+   int max_reg_descriptors;
+   u32 flags;
+};
+
 /**
  * mult_to_ib_rate - Convert a multiple of 2.5 Gbit/sec to an IB rate
  * enum.
@@ -1372,6 +1388,9 @@ struct ib_device {
int(*query_mr)(struct ib_mr *mr,
   struct ib_mr_attr *mr_attr);
int(*dereg_mr)(struct ib_mr *mr);
+   int(*destroy_mr)(struct ib_mr *mr);
+   struct ib_mr * (*create_mr)(struct ib_pd *pd,
+   struct ib_mr_init_attr 
*mr_init_attr);
struct ib_mr * (*alloc_fast_reg_mr)(struct ib_pd *pd,
   int max_page_list_len);
struct ib_fast_reg_page_list * (*alloc_fast_reg_page_list)(struct 
ib_device *device,
@@ -2212,6 +2231,25 @@ int ib_query_mr(struct ib_mr *mr, struct ib_mr_attr 
*mr_attr);
  */
 int ib_dereg_mr(struct ib_mr *mr);
 
+
+/**
+ * ib_create_mr - Allocates a memory region that may be used for
+ * signature handover operations.
+ * @pd: The protection domain associated with the region.
+ * @mr_init_attr: memory region init attributes.
+ */
+struct ib_mr *ib_create_mr(struct ib_pd *pd,
+  struct ib_mr_init_attr *mr_init_attr);
+
+/**
+ * ib_destroy_mr - Destroys a memory region that was created using
+ * ib_create_mr and removes it from HW translation tables.
+ * @mr: The memory region to destroy.
+ *
+ * This function can fail, if the memory region has memory windows bound to it.
+ */
+int ib_destroy_mr(struct ib_mr *mr);
+
 /**
  * ib_alloc_fast_reg_mr - Allocates memory region usable with the
  *   IB_WR_FAST_REG_MR send work request.
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message

Re: [PATCH v3 00/10] Introduce Signature feature

2013-11-14 Thread Sagi Grimberg

On 11/14/2013 9:30 AM, Or Gerlitz wrote:

On 14/11/2013 02:19, Hefty, Sean wrote:
The patch series is around for couple of weeks already and went 
through the review of Sean and Bart, with all their feedback being 
applied. Also Sagi and Co enhanced krping to fully cover (and 
test...) the proposed API and driver implementation

Somewhat separate from this specific patch, this is my concern.

There are continual requests to modify the kernel verbs interfaces.  
These requests boil down to exposing proprietary capabilities to the 
latest version of some vendor's hardware. In turn, these hardware 
specific knobs bleed into the kernel clients.


At the very least, it seems that there should be some sort of 
discussion if this is a desirable property of the kernel verbs 
interface, and if this is the architecture that the kernel should 
continue to pursue.  Or, is there an alternative way of providing the 
same ability of coding ULPs to specific HW features, versus plugging 
every new feature into 'post send'?


Sean,

Being concrete + re-iterating  and expanding what I wrote you earlier 
on the V1 thread @ 
http://marc.info/?l=linux-rdmam=138314853203389w=2when you said


Sean  Maybe we should rethink the approach of exposing low-level 
hardware constructs to every
Sean  distinct feature of every vendor's latest hardware directly to 
the kernel ULPs.


To begin with T10 DIF **is** industry standard, which is to be used in 
production storage systems, the feature here is T10 DIF acceleration 
for upstream kernel storage drivers such as iSER/SRP/FCoE 
initiator/targets that use RDMA and are included in commercial 
distributions which are used by customers. Note that this/similar 
feature is supported by some FC cards too, so we want RDMA to be 
competitive.


This work is part of larger efforts which are done nowadays in other 
parts of the kernel such as the block layer, the upstream kernel 
target and more to support T10, its just the RDMA part.


Sagi and team made great effort to expose API which isn't tied to 
specific HW/Firmware API. And in that respect, the verbs API is 
coupled with industry standards and by no means with specific HW 
features. Just as quick example, the specific driver/card (mlx5 / 
ConnectIB) for which the news verbs are implemented uses three objects 
for its T10 support, named BSF, KLM and PSV - you can be sure, and 
please check us  that there is no sign for them in the verbs API, they 
only live within the mlx5 driver.


If you see a vendor specific feature/construct that appears in the 
proposed verbs API changes, let us know.


 [...] versus plugging every new feature into 'post send'?

Its a new feature indeed but its a feature which comes into play when 
submitting RDMA work-requests to the HCA and
for performance reasons must be subject to pipe-lining in the form of 
batched posting and hence has very good fit as

a sub operation of post-send.

Sean  There are continual requests to modify the kernel verbs 
interfaces. These requests boil down to exposing proprietary capabilities
Sean   to the latest version of some vendor's hardware. In turn, 
these hardware specific knobs bleed into the kernel clients.


non-T10 examples (please) ?!

Or.


Hey Sean,

Just to add on Or's input,
I really don't agree this is some specific HW capability exposed to 
ULPs. This feature allows offloading data-integrity handling over RDMA 
which is a wider concept then just T10-DIF (although we currently expose 
T10-DIF alone).
Signature verbs API does not introduce something specific to Mellanox, 
we think API is generic enough to allow each vendor to support signature 
with some degree of freedom.
Just needs to implement the 3-steps: create signature enabled MR, bind 
MR to signature attributes (work-request) and check for signature status 
at the end of the transaction.


Regarding plugging into post_send, The signature operation is a 
fast-path operation and I agree with Or regarding the value of batching 
work requests.
Moreover, I think this is a separate discussion. If we agree on another 
API posting on the send-queue, it will require work also for migrating 
fastreg and bind_mw extensions.
So how about going with current framework, and start a discussion on 
your concern taking non-SEND WR extensions out of post_send.


Sagi.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 04/10] IB/mlx5: Initialize mlx5_ib_qp signature related

2013-12-16 Thread Sagi Grimberg
If user requested signature enable we Initialize
relevant mlx5_ib_qp members. we mark the qp as sig_enable
and we check if wqe_size will fit compound REG_SIG_MR work
request (UMR + 2 x SET_PSV wqes), if computed wqe_size is
smaller we align wqe_size to MLX5_SIG_WQE_SIZE.

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/hw/mlx5/mlx5_ib.h |3 +++
 drivers/infiniband/hw/mlx5/qp.c  |   10 --
 include/linux/mlx5/qp.h  |1 +
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 43e0497..62b9e93 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -189,6 +189,9 @@ struct mlx5_ib_qp {
 
int create_type;
u32 pa_lkey;
+
+   /* Store signature errors */
+   boolsignature_en;
 };
 
 struct mlx5_ib_cq_buf {
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 7c6b4ba..07aa3ca 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -254,8 +254,11 @@ static int calc_send_wqe(struct ib_qp_init_attr *attr)
}
 
size += attr-cap.max_send_sge * sizeof(struct mlx5_wqe_data_seg);
-
-   return ALIGN(max_t(int, inl_size, size), MLX5_SEND_WQE_BB);
+   if (attr-create_flags  IB_QP_CREATE_SIGNATURE_EN 
+   ALIGN(max_t(int, inl_size, size), MLX5_SEND_WQE_BB)  
MLX5_SIG_WQE_SIZE)
+   return MLX5_SIG_WQE_SIZE;
+   else
+   return ALIGN(max_t(int, inl_size, size), MLX5_SEND_WQE_BB);
 }
 
 static int calc_sq_size(struct mlx5_ib_dev *dev, struct ib_qp_init_attr *attr,
@@ -282,6 +285,9 @@ static int calc_sq_size(struct mlx5_ib_dev *dev, struct 
ib_qp_init_attr *attr,
sizeof(struct mlx5_wqe_inline_seg);
attr-cap.max_inline_data = qp-max_inline_data;
 
+   if (attr-create_flags  IB_QP_CREATE_SIGNATURE_EN)
+   qp-signature_en = true;
+
wq_size = roundup_pow_of_two(attr-cap.max_send_wr * wqe_size);
qp-sq.wqe_cnt = wq_size / MLX5_SEND_WQE_BB;
if (qp-sq.wqe_cnt  dev-mdev.caps.max_wqes) {
diff --git a/include/linux/mlx5/qp.h b/include/linux/mlx5/qp.h
index d9e3eac..711094c 100644
--- a/include/linux/mlx5/qp.h
+++ b/include/linux/mlx5/qp.h
@@ -37,6 +37,7 @@
 #include linux/mlx5/driver.h
 
 #define MLX5_INVALID_LKEY  0x100
+#define MLX5_SIG_WQE_SIZE (MLX5_SEND_WQE_BB * 5)
 
 enum mlx5_qp_optpar {
MLX5_QP_OPTPAR_ALT_ADDR_PATH= 1  0,
-- 
1.7.8.2

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 03/10] IB/mlx5, mlx5_core: Support for create_mr and destroy_mr

2013-12-16 Thread Sagi Grimberg
Support create_mr and destroy_mr verbs.

For now, create/destroy routines will only support user request
for signature enabled memory regions. The created memory region
will be an indirect memory key that will be able to register
pre-registered data buffer and protection guards buffer (pre-registered
as well). The corresponding mlx5_ib_mr will be attached with mlx5
specific signature entities (BSF, PSVs). for non-signature
enabled regions, the resulting ib_mr is a free region applicable
for fast registration work requests.

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/hw/mlx5/main.c|2 +
 drivers/infiniband/hw/mlx5/mlx5_ib.h |4 +
 drivers/infiniband/hw/mlx5/mr.c  |  111 ++
 drivers/net/ethernet/mellanox/mlx5/core/mr.c |   61 ++
 include/linux/mlx5/device.h  |   25 ++
 include/linux/mlx5/driver.h  |   19 +
 6 files changed, 222 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 3065341..10263fa 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1406,9 +1406,11 @@ static int init_one(struct pci_dev *pdev,
dev-ib_dev.get_dma_mr  = mlx5_ib_get_dma_mr;
dev-ib_dev.reg_user_mr = mlx5_ib_reg_user_mr;
dev-ib_dev.dereg_mr= mlx5_ib_dereg_mr;
+   dev-ib_dev.destroy_mr  = mlx5_ib_destroy_mr;
dev-ib_dev.attach_mcast= mlx5_ib_mcg_attach;
dev-ib_dev.detach_mcast= mlx5_ib_mcg_detach;
dev-ib_dev.process_mad = mlx5_ib_process_mad;
+   dev-ib_dev.create_mr   = mlx5_ib_create_mr;
dev-ib_dev.alloc_fast_reg_mr   = mlx5_ib_alloc_fast_reg_mr;
dev-ib_dev.alloc_fast_reg_page_list = mlx5_ib_alloc_fast_reg_page_list;
dev-ib_dev.free_fast_reg_page_list  = mlx5_ib_free_fast_reg_page_list;
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 4c134d9..43e0497 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -265,6 +265,7 @@ struct mlx5_ib_mr {
struct mlx5_ib_dev *dev;
struct mlx5_create_mkey_mbox_out out;
unsigned long   start;
+   struct mlx5_core_sig_ctx*sig;
 };
 
 struct mlx5_ib_fast_reg_page_list {
@@ -495,6 +496,9 @@ struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 
start, u64 length,
  u64 virt_addr, int access_flags,
  struct ib_udata *udata);
 int mlx5_ib_dereg_mr(struct ib_mr *ibmr);
+int mlx5_ib_destroy_mr(struct ib_mr *ibmr);
+struct ib_mr *mlx5_ib_create_mr(struct ib_pd *pd,
+   struct ib_mr_init_attr *mr_init_attr);
 struct ib_mr *mlx5_ib_alloc_fast_reg_mr(struct ib_pd *pd,
int max_page_list_len);
 struct ib_fast_reg_page_list *mlx5_ib_alloc_fast_reg_page_list(struct 
ib_device *ibdev,
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 039c3e4..e65cd0c 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -993,6 +993,117 @@ int mlx5_ib_dereg_mr(struct ib_mr *ibmr)
return 0;
 }
 
+struct ib_mr *mlx5_ib_create_mr(struct ib_pd *pd,
+   struct ib_mr_init_attr *mr_init_attr)
+{
+   struct mlx5_ib_dev *dev = to_mdev(pd-device);
+   struct mlx5_create_mkey_mbox_in *in;
+   struct mlx5_ib_mr *mr;
+   int access_mode, err;
+   int ndescs = roundup(mr_init_attr-max_reg_descriptors, 4);
+
+   mr = kzalloc(sizeof(*mr), GFP_KERNEL);
+   if (!mr)
+   return ERR_PTR(-ENOMEM);
+
+   in = kzalloc(sizeof(*in), GFP_KERNEL);
+   if (!in) {
+   err = -ENOMEM;
+   goto err_free;
+   }
+
+   in-seg.status = 1  6; /* free */
+   in-seg.xlt_oct_size = cpu_to_be32(ndescs);
+   in-seg.qpn_mkey7_0 = cpu_to_be32(0xff  8);
+   in-seg.flags_pd = cpu_to_be32(to_mpd(pd)-pdn);
+   access_mode = MLX5_ACCESS_MODE_MTT;
+
+   if (mr_init_attr-flags  IB_MR_SIGNATURE_EN) {
+   u32 psv_index[2];
+
+   in-seg.flags_pd = cpu_to_be32(be32_to_cpu(in-seg.flags_pd) |
+  MLX5_MKEY_BSF_EN);
+   in-seg.bsfs_octo_size = cpu_to_be32(MLX5_MKEY_BSF_OCTO_SIZE);
+   mr-sig = kzalloc(sizeof(*mr-sig), GFP_KERNEL);
+   if (!mr-sig) {
+   err = -ENOMEM;
+   goto err_free_in;
+   }
+
+   /* create mem  wire PSVs */
+   err = mlx5_core_create_psv(dev-mdev, to_mpd(pd)-pdn,
+  2, psv_index);
+   if (err)
+   goto err_free_sig;
+
+   access_mode

[PATCH v4 09/10] IB/mlx5: Collect signature error completion

2013-12-16 Thread Sagi Grimberg
This commit takes care of the generated signature
error cqe generated by the HW (if happened). The
underlying mlx5 driver will handle signature error
completions and will lookup the relevant memory region
(under a read_lock) and mark it as dirty (contains a
signature error).

Once the user will get the completion for the transaction
he must check for signature errors on signature memory region
using a new lightweight verb ib_check_mr_status and if such
exsists, he will get signature error information such as
error type, error offset, expected/actual values.

In case the user will not check for signature error, i.e.
won't call ib_check_mr_status with status check
IB_MR_CHECK_SIG_STATUS, it will not be allowed to use
the memory region for another signature operation
(REG_SIG_MR work request will fail).

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/hw/mlx5/cq.c  |   64 ++
 drivers/infiniband/hw/mlx5/main.c|1 +
 drivers/infiniband/hw/mlx5/mlx5_ib.h |7 
 drivers/infiniband/hw/mlx5/mr.c  |   47 +
 drivers/infiniband/hw/mlx5/qp.c  |8 +++-
 include/linux/mlx5/cq.h  |1 +
 include/linux/mlx5/device.h  |   18 +
 include/linux/mlx5/driver.h  |4 ++
 include/linux/mlx5/qp.h  |5 +++
 9 files changed, 153 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index b726274..0990a54 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -351,6 +351,38 @@ static void handle_atomics(struct mlx5_ib_qp *qp, struct 
mlx5_cqe64 *cqe64,
qp-sq.last_poll = tail;
 }
 
+static void get_sig_err_item(struct mlx5_sig_err_cqe *cqe,
+struct ib_sig_err *item)
+{
+   u16 syndrome = be16_to_cpu(cqe-syndrome);
+
+#define GUARD_ERR   (1  13)
+#define APPTAG_ERR  (1  12)
+#define REFTAG_ERR  (1  11)
+
+   if (syndrome  GUARD_ERR) {
+   item-err_type = IB_SIG_BAD_GUARD;
+   item-expected = be32_to_cpu(cqe-expected_trans_sig)  16;
+   item-actual = be32_to_cpu(cqe-actual_trans_sig)  16;
+   } else
+   if (syndrome  REFTAG_ERR) {
+   item-err_type = IB_SIG_BAD_REFTAG;
+   item-expected = be32_to_cpu(cqe-expected_reftag);
+   item-actual = be32_to_cpu(cqe-actual_reftag);
+   } else
+   if (syndrome  APPTAG_ERR) {
+   item-err_type = IB_SIG_BAD_APPTAG;
+   item-expected = be32_to_cpu(cqe-expected_trans_sig)  0x;
+   item-actual = be32_to_cpu(cqe-actual_trans_sig)  0x;
+   } else {
+   pr_err(Got signature completion error with bad syndrom %04x\n,
+  syndrome);
+   }
+
+   item-sig_err_offset = be64_to_cpu(cqe-err_offset);
+   item-key = be32_to_cpu(cqe-mkey);
+}
+
 static int mlx5_poll_one(struct mlx5_ib_cq *cq,
 struct mlx5_ib_qp **cur_qp,
 struct ib_wc *wc)
@@ -360,12 +392,16 @@ static int mlx5_poll_one(struct mlx5_ib_cq *cq,
struct mlx5_cqe64 *cqe64;
struct mlx5_core_qp *mqp;
struct mlx5_ib_wq *wq;
+   struct mlx5_sig_err_cqe *sig_err_cqe;
+   struct mlx5_core_mr *mmr;
+   struct mlx5_ib_mr *mr;
uint8_t opcode;
uint32_t qpn;
u16 wqe_ctr;
void *cqe;
int idx;
 
+repoll:
cqe = next_cqe_sw(cq);
if (!cqe)
return -EAGAIN;
@@ -449,6 +485,34 @@ static int mlx5_poll_one(struct mlx5_ib_cq *cq,
}
}
break;
+   case MLX5_CQE_SIG_ERR:
+   sig_err_cqe = (struct mlx5_sig_err_cqe *)cqe64;
+
+   read_lock(dev-mdev.priv.mr_table.lock);
+   mmr = __mlx5_mr_lookup(dev-mdev,
+  
mlx5_base_mkey(be32_to_cpu(sig_err_cqe-mkey)));
+   if (unlikely(!mmr)) {
+   read_unlock(dev-mdev.priv.mr_table.lock);
+   mlx5_ib_warn(dev, CQE@CQ %06x for unknown MR %6x\n,
+cq-mcq.cqn, 
be32_to_cpu(sig_err_cqe-mkey));
+   return -EINVAL;
+   }
+
+   mr = to_mibmr(mmr);
+   get_sig_err_item(sig_err_cqe, mr-sig-err_item);
+   mr-sig-sig_err_exists = true;
+   mr-sig-sigerr_count++;
+
+   mlx5_ib_warn(dev, CQN: 0x%x Got SIGERR on key: 0x%x err_type 
%x 
+err_offset %llx expected %x actual %x\n,
+cq-mcq.cqn, mr-sig-err_item.key,
+mr-sig-err_item.err_type,
+mr-sig-err_item.sig_err_offset,
+mr-sig-err_item.expected,
+mr-sig-err_item.actual);
+
+   read_unlock(dev

[PATCH v4 01/10] IB/core: Introduce protected memory regions

2013-12-16 Thread Sagi Grimberg
This commit introduces verbs for creating/destoying memory
regions which will allow new types of memory key operations such
as protected memory registration.

Indirect memory registration is registering several (one
of more) pre-registered memory regions in a specific layout.
The Indirect region may potentialy describe several regions
and some repitition format between them.

Protected Memory registration is registering a memory region
with various data integrity attributes which will describe protection
schemes that will be handled by the HCA in an offloaded manner.
A protected region will describe pre-registered regions for data,
protection block guards and the repetitive stride of them.
These memory regions will be applicable for a new REG_SIG_MR
work request introduced later in this patchset.

In the future these routines may replace or implement current memory
regions creation routines existing today:
- ib_reg_user_mr
- ib_alloc_fast_reg_mr
- ib_get_dma_mr
- ib_dereg_mr

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/core/verbs.c |   39 +++
 include/rdma/ib_verbs.h |   38 ++
 2 files changed, 77 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index d4f6ddf..f4c3bfb 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1072,6 +1072,45 @@ int ib_dereg_mr(struct ib_mr *mr)
 }
 EXPORT_SYMBOL(ib_dereg_mr);
 
+struct ib_mr *ib_create_mr(struct ib_pd *pd,
+  struct ib_mr_init_attr *mr_init_attr)
+{
+   struct ib_mr *mr;
+
+   if (!pd-device-create_mr)
+   return ERR_PTR(-ENOSYS);
+
+   mr = pd-device-create_mr(pd, mr_init_attr);
+
+   if (!IS_ERR(mr)) {
+   mr-device  = pd-device;
+   mr-pd  = pd;
+   mr-uobject = NULL;
+   atomic_inc(pd-usecnt);
+   atomic_set(mr-usecnt, 0);
+   }
+
+   return mr;
+}
+EXPORT_SYMBOL(ib_create_mr);
+
+int ib_destroy_mr(struct ib_mr *mr)
+{
+   struct ib_pd *pd;
+   int ret;
+
+   if (atomic_read(mr-usecnt))
+   return -EBUSY;
+
+   pd = mr-pd;
+   ret = mr-device-destroy_mr(mr);
+   if (!ret)
+   atomic_dec(pd-usecnt);
+
+   return ret;
+}
+EXPORT_SYMBOL(ib_destroy_mr);
+
 struct ib_mr *ib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len)
 {
struct ib_mr *mr;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 979874c..81d1406 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -457,6 +457,22 @@ int ib_rate_to_mult(enum ib_rate rate) __attribute_const__;
  */
 int ib_rate_to_mbps(enum ib_rate rate) __attribute_const__;
 
+enum ib_mr_create_flags {
+   IB_MR_SIGNATURE_EN = 1,
+};
+
+/**
+ * ib_mr_init_attr - Memory region init attributes passed to routine
+ * ib_create_mr.
+ * @max_reg_descriptors: max number of registration descriptors that
+ * may be used with registration work requests.
+ * @flags: MR creation flags bit mask.
+ */
+struct ib_mr_init_attr {
+   int max_reg_descriptors;
+   u32 flags;
+};
+
 /**
  * mult_to_ib_rate - Convert a multiple of 2.5 Gbit/sec to an IB rate
  * enum.
@@ -1374,6 +1390,9 @@ struct ib_device {
int(*query_mr)(struct ib_mr *mr,
   struct ib_mr_attr *mr_attr);
int(*dereg_mr)(struct ib_mr *mr);
+   int(*destroy_mr)(struct ib_mr *mr);
+   struct ib_mr * (*create_mr)(struct ib_pd *pd,
+   struct ib_mr_init_attr 
*mr_init_attr);
struct ib_mr * (*alloc_fast_reg_mr)(struct ib_pd *pd,
   int max_page_list_len);
struct ib_fast_reg_page_list * (*alloc_fast_reg_page_list)(struct 
ib_device *device,
@@ -2215,6 +2234,25 @@ int ib_query_mr(struct ib_mr *mr, struct ib_mr_attr 
*mr_attr);
  */
 int ib_dereg_mr(struct ib_mr *mr);
 
+
+/**
+ * ib_create_mr - Allocates a memory region that may be used for
+ * signature handover operations.
+ * @pd: The protection domain associated with the region.
+ * @mr_init_attr: memory region init attributes.
+ */
+struct ib_mr *ib_create_mr(struct ib_pd *pd,
+  struct ib_mr_init_attr *mr_init_attr);
+
+/**
+ * ib_destroy_mr - Destroys a memory region that was created using
+ * ib_create_mr and removes it from HW translation tables.
+ * @mr: The memory region to destroy.
+ *
+ * This function can fail, if the memory region has memory windows bound to it.
+ */
+int ib_destroy_mr(struct ib_mr *mr);
+
 /**
  * ib_alloc_fast_reg_mr - Allocates memory region usable with the
  *   IB_WR_FAST_REG_MR send work request.
-- 
1.7.8.2

--
To unsubscribe from this list: send

[PATCH v4 02/10] IB/core: Introduce Signature Verbs API

2013-12-16 Thread Sagi Grimberg
This commit Introduces the Verbs Interface for signature related
operations. A signature handover operation shall configure the
layouts of data and protection attributes both in memory and wire
domains.

Signature operations are:
- INSERT
  Generate and insert protection information when handing over
  data from input space to output space.
- vaildate and STRIP:
  Validate protection information and remove it when handing over
  data from input space to output space.
- validate and PASS:
  Validate protection information and pass it when handing over
  data from input space to output space.

Once the signature handover opration is done, the HCA will
offload data integrity generation/validation while performing
the actual data transfer.

Additions:
1. HCA signature capabilities in device attributes
Verbs provider supporting Signature handover operations shall
fill relevant fields in device attributes structure returned
by ib_query_device.

2. QP creation flag IB_QP_CREATE_SIGNATURE_EN
Creating QP that will carry signature handover operations
may require some special preperations from the verbs provider.
So we add QP creation flag IB_QP_CREATE_SIGNATURE_EN to declare
that the created QP may carry out signature handover operations.
Expose signature support to verbs layer (no support for now)

3. New send work request IB_WR_REG_SIG_MR
Signature handover work request. This WR will define the signature
handover properties of the memory/wire domains as well as the domains
layout. The purpose of this work request is to bind all the needed
information for the signature operation:
- data to be transferred:  wr-sg_list (ib_sge).
  * The raw data, pre-registered to a single MR (normally, before
signature, this MR would have been used directly for the data
transfer)
- data protection guards: sig_handover.prot (ib_sge).
  * The data protection buffer, pre-registered to a single MR, which
contains the data integrity guards of the raw data blocks.
Note that it may not always exist, only in cases where the user is
interested in storing protection guards in memory.
- signature operation attributes: sig_handover.sig_attrs.
  * Tells the HCA how to validate/generate the protection information.

Once the work request is executed, the memory region which
will describe the signature transaction will be the sig_mr. The
application can now go ahead and send the sig_mr.rkey or use the
sig_mr.lkey for data transfer.

4. New Verb ib_check_mr_status
check_mr_status Verb shall check the status of the memory region
post transaction. The first check that may be used is
IB_MR_CHECK_SIG_STATUS which will indicate if any signature errors
are pending for a specific signature-enabled ib_mr.
This Verb is a lightwight check and is allowed to be taken
from interrupt context. Application must call this verb after
it is known that the actual data transfer has finished.

issue: 333508
Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/core/verbs.c |8 ++
 include/rdma/ib_verbs.h |  149 ++-
 2 files changed, 156 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index f4c3bfb..f617cb9 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1340,3 +1340,11 @@ int ib_destroy_flow(struct ib_flow *flow_id)
return err;
 }
 EXPORT_SYMBOL(ib_destroy_flow);
+
+int ib_check_mr_status(struct ib_mr *mr, u32 check_mask,
+  struct ib_mr_status *mr_status)
+{
+   return mr-device-check_mr_status ?
+   mr-device-check_mr_status(mr, check_mask, mr_status) : 
-ENOSYS;
+}
+EXPORT_SYMBOL(ib_check_mr_status);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 81d1406..2c75c29 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -119,7 +119,19 @@ enum ib_device_cap_flags {
IB_DEVICE_BLOCK_MULTICAST_LOOPBACK = (122),
IB_DEVICE_MEM_WINDOW_TYPE_2A= (123),
IB_DEVICE_MEM_WINDOW_TYPE_2B= (124),
-   IB_DEVICE_MANAGED_FLOW_STEERING = (129)
+   IB_DEVICE_MANAGED_FLOW_STEERING = (129),
+   IB_DEVICE_SIGNATURE_HANDOVER= (130)
+};
+
+enum ib_signature_prot_cap {
+   IB_PROT_T10DIF_TYPE_1 = 1,
+   IB_PROT_T10DIF_TYPE_2 = 1  1,
+   IB_PROT_T10DIF_TYPE_3 = 1  2,
+};
+
+enum ib_signature_guard_cap {
+   IB_GUARD_T10DIF_CRC = 1,
+   IB_GUARD_T10DIF_CSUM= 1  1,
 };
 
 enum ib_atomic_cap {
@@ -169,6 +181,8 @@ struct ib_device_attr {
unsigned intmax_fast_reg_page_list_len;
u16 max_pkeys;
u8  local_ca_ack_delay;
+   int sig_prot_cap;
+   int sig_guard_cap;
 };
 
 enum ib_mtu {
@@ -473,6 +487,114 @@ struct

[PATCH v4 07/10] IB/mlx5: Keep mlx5 MRs in a radix tree under device

2013-12-16 Thread Sagi Grimberg
This radix tree will be useful when processing signature
errors on a specific key. The mlx5 driver shall lookup the
matching mlx5 memory region structure and mark it as
dirty (contains signature errors). The radix tree is protected
under a rw_lock as signature error processing is guaranteed not
to compete with other contexts for a specific key, thus read_lock
is sufficient.

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c |1 +
 drivers/net/ethernet/mellanox/mlx5/core/mr.c   |   24 
 include/linux/mlx5/driver.h|   18 ++
 3 files changed, 43 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 40a9f5e..6e77c8e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -446,6 +446,7 @@ int mlx5_dev_init(struct mlx5_core_dev *dev, struct pci_dev 
*pdev)
mlx5_init_cq_table(dev);
mlx5_init_qp_table(dev);
mlx5_init_srq_table(dev);
+   mlx5_init_mr_table(dev);
 
return 0;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mr.c 
b/drivers/net/ethernet/mellanox/mlx5/core/mr.c
index bb746bb..4cc9276 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mr.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mr.c
@@ -36,11 +36,24 @@
 #include linux/mlx5/cmd.h
 #include mlx5_core.h
 
+void mlx5_init_mr_table(struct mlx5_core_dev *dev)
+{
+   struct mlx5_mr_table *table = dev-priv.mr_table;
+
+   rwlock_init(table-lock);
+   INIT_RADIX_TREE(table-tree, GFP_ATOMIC);
+}
+
+void mlx5_cleanup_mr_table(struct mlx5_core_dev *dev)
+{
+}
+
 int mlx5_core_create_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr,
  struct mlx5_create_mkey_mbox_in *in, int inlen,
  mlx5_cmd_cbk_t callback, void *context,
  struct mlx5_create_mkey_mbox_out *out)
 {
+   struct mlx5_mr_table *table = dev-priv.mr_table;
struct mlx5_create_mkey_mbox_out lout;
int err;
u8 key;
@@ -73,14 +86,21 @@ int mlx5_core_create_mkey(struct mlx5_core_dev *dev, struct 
mlx5_core_mr *mr,
mlx5_core_dbg(dev, out 0x%x, key 0x%x, mkey 0x%x\n,
  be32_to_cpu(lout.mkey), key, mr-key);
 
+   /* connect to MR tree */
+   write_lock_irq(table-lock);
+   err = radix_tree_insert(table-tree, mlx5_base_mkey(mr-key), mr);
+   write_unlock_irq(table-lock);
+
return err;
 }
 EXPORT_SYMBOL(mlx5_core_create_mkey);
 
 int mlx5_core_destroy_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr)
 {
+   struct mlx5_mr_table *table = dev-priv.mr_table;
struct mlx5_destroy_mkey_mbox_in in;
struct mlx5_destroy_mkey_mbox_out out;
+   unsigned long flags;
int err;
 
memset(in, 0, sizeof(in));
@@ -95,6 +115,10 @@ int mlx5_core_destroy_mkey(struct mlx5_core_dev *dev, 
struct mlx5_core_mr *mr)
if (out.hdr.status)
return mlx5_cmd_status_to_err(out.hdr);
 
+   write_lock_irqsave(table-lock, flags);
+   radix_tree_delete(table-tree, mlx5_base_mkey(mr-key));
+   write_unlock_irqrestore(table-lock, flags);
+
return err;
 }
 EXPORT_SYMBOL(mlx5_core_destroy_mkey);
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 58f5b95..1d97762 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -488,6 +488,13 @@ struct mlx5_srq_table {
struct radix_tree_root  tree;
 };
 
+struct mlx5_mr_table {
+   /* protect radix tree
+*/
+   rwlock_tlock;
+   struct radix_tree_root  tree;
+};
+
 struct mlx5_priv {
charname[MLX5_MAX_NAME_LEN];
struct mlx5_eq_tableeq_table;
@@ -517,6 +524,10 @@ struct mlx5_priv {
struct mlx5_cq_tablecq_table;
/* end: cq staff */
 
+   /* start: mr staff */
+   struct mlx5_mr_tablemr_table;
+   /* end: mr staff */
+
/* start: alloc staff */
struct mutexpgdir_mutex;
struct list_headpgdir_list;
@@ -664,6 +675,11 @@ static inline void mlx5_vfree(const void *addr)
kfree(addr);
 }
 
+static inline u32 mlx5_base_mkey(const u32 key)
+{
+   return key  0xff00u;
+}
+
 int mlx5_dev_init(struct mlx5_core_dev *dev, struct pci_dev *pdev);
 void mlx5_dev_cleanup(struct mlx5_core_dev *dev);
 int mlx5_cmd_init(struct mlx5_core_dev *dev);
@@ -698,6 +714,8 @@ int mlx5_core_query_srq(struct mlx5_core_dev *dev, struct 
mlx5_core_srq *srq,
struct mlx5_query_srq_mbox_out *out);
 int mlx5_core_arm_srq(struct mlx5_core_dev *dev, struct mlx5_core_srq *srq,
  u16 lwm, int is_srq);
+void mlx5_init_mr_table(struct mlx5_core_dev *dev);
+void mlx5_cleanup_mr_table(struct mlx5_core_dev *dev);
 int

[PATCH v4 00/10] Introduce Signature feature

2013-12-16 Thread Sagi Grimberg
 detailed explanation for signature work request.
- IB/core: Remove indirect memory registration enablement from create_mr.
   Keep only signature enablement.
- IB/mlx5: Changed signature error processing via MR radix lookup.

Sagi Grimberg (10):
  IB/core: Introduce protected memory regions
  IB/core: Introduce Signature Verbs API
  IB/mlx5, mlx5_core: Support for create_mr and destroy_mr
  IB/mlx5: Initialize mlx5_ib_qp signature related
  IB/mlx5: Break wqe handling to begin  finish routines
  IB/mlx5: remove MTT access mode from umr flags helper function
  IB/mlx5: Keep mlx5 MRs in a radix tree under device
  IB/mlx5: Support IB_WR_REG_SIG_MR
  IB/mlx5: Collect signature error completion
  IB/mlx5: Publish support in signature feature

 drivers/infiniband/core/verbs.c|   47 ++
 drivers/infiniband/hw/mlx5/cq.c|   64 +++
 drivers/infiniband/hw/mlx5/main.c  |   12 +
 drivers/infiniband/hw/mlx5/mlx5_ib.h   |   14 +
 drivers/infiniband/hw/mlx5/mr.c|  158 +++
 drivers/infiniband/hw/mlx5/qp.c|  559 ++--
 drivers/net/ethernet/mellanox/mlx5/core/main.c |1 +
 drivers/net/ethernet/mellanox/mlx5/core/mr.c   |   85 
 include/linux/mlx5/cq.h|1 +
 include/linux/mlx5/device.h|   47 ++
 include/linux/mlx5/driver.h|   41 ++
 include/linux/mlx5/qp.h|   67 +++
 include/rdma/ib_verbs.h|  187 -
 13 files changed, 1242 insertions(+), 41 deletions(-)

-- 
1.7.8.2

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 10/10] IB/mlx5: Publish support in signature feature

2013-12-16 Thread Sagi Grimberg
Currently support only T10-DIF types of signature
handover operations (typs 1|2|3).

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/hw/mlx5/main.c |9 +
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 89ae2e5..63d9044 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -274,6 +274,15 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
if (flags  MLX5_DEV_CAP_FLAG_XRC)
props-device_cap_flags |= IB_DEVICE_XRC;
props-device_cap_flags |= IB_DEVICE_MEM_MGT_EXTENSIONS;
+   if (flags  MLX5_DEV_CAP_FLAG_SIG_HAND_OVER) {
+   props-device_cap_flags |= IB_DEVICE_SIGNATURE_HANDOVER;
+   /* At this stage no support for signature handover */
+   props-sig_prot_cap = IB_PROT_T10DIF_TYPE_1 |
+ IB_PROT_T10DIF_TYPE_2 |
+ IB_PROT_T10DIF_TYPE_3;
+   props-sig_guard_cap = IB_GUARD_T10DIF_CRC |
+  IB_GUARD_T10DIF_CSUM;
+   }
 
props-vendor_id   = be32_to_cpup((__be32 *)(out_mad-data + 
36)) 
0xff;
-- 
1.7.8.2

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 08/10] IB/mlx5: Support IB_WR_REG_SIG_MR

2013-12-16 Thread Sagi Grimberg
This patch implements IB_WR_REG_SIG_MR posted by the user.

Baisically this WR involvs 3 WQEs in order to prepare and properly
register the signature layout:

1. post UMR WR to register the sig_mr in one of two possible ways:
* In case the user registered a single MR for data so the UMR data segment
  consists of:
  - single klm (data MR) passed by the user
  - BSF with signature attributes requested by the user.
* In case the user registered 2 MRs, one for data and one for protection,
  the UMR consists of:
  - strided block format which includes data and protection MRs and
their repetitive block format.
  - BSF with signature attributes requested by the user.

2. post SET_PSV in order to set the memory domain initial
   signature parameters passed by the user.
   SET_PSV is not signaled and solicited CQE.

3. post SET_PSV in order to set the wire domain initial
   signature parameters passed by the user.
   SET_PSV is not signaled and solicited CQE.

* After this compound WR we place a small fence for next WR to come.

This patch also introduces some helper functions to set the BSF correctly
and determining the signature format selectors.

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/hw/mlx5/qp.c |  443 +++
 include/linux/mlx5/device.h |4 +
 include/linux/mlx5/qp.h |   61 ++
 3 files changed, 508 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 1329c10..b0f066b 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -1720,6 +1720,26 @@ static __be64 frwr_mkey_mask(void)
return cpu_to_be64(result);
 }
 
+static __be64 sig_mkey_mask(void)
+{
+   u64 result;
+
+   result = MLX5_MKEY_MASK_LEN |
+   MLX5_MKEY_MASK_PAGE_SIZE|
+   MLX5_MKEY_MASK_START_ADDR   |
+   MLX5_MKEY_MASK_EN_RINVAL|
+   MLX5_MKEY_MASK_KEY  |
+   MLX5_MKEY_MASK_LR   |
+   MLX5_MKEY_MASK_LW   |
+   MLX5_MKEY_MASK_RR   |
+   MLX5_MKEY_MASK_RW   |
+   MLX5_MKEY_MASK_SMALL_FENCE  |
+   MLX5_MKEY_MASK_FREE |
+   MLX5_MKEY_MASK_BSF_EN;
+
+   return cpu_to_be64(result);
+}
+
 static void set_frwr_umr_segment(struct mlx5_wqe_umr_ctrl_seg *umr,
 struct ib_send_wr *wr, int li)
 {
@@ -1904,6 +1924,360 @@ static int set_data_inl_seg(struct mlx5_ib_qp *qp, 
struct ib_send_wr *wr,
return 0;
 }
 
+static u16 prot_field_size(enum ib_signature_type type)
+{
+   switch (type) {
+   case IB_SIG_TYPE_T10_DIF:
+   return MLX5_DIF_SIZE;
+   default:
+   return 0;
+   }
+}
+
+static int bs_selector(u32 block_size, u8 *selector)
+{
+   switch (block_size) {
+   case 512:
+   *selector = 0x1;
+   break;
+   case 520:
+   *selector = 0x2;
+   break;
+   case 4096:
+   *selector = 0x3;
+   break;
+   case 4160:
+   *selector = 0x4;
+   break;
+   case 1073741824:
+   *selector = 0x5;
+   break;
+   default:
+   return -EINVAL;
+   }
+   return 0;
+}
+
+static int format_selector(struct ib_sig_attrs *attr,
+  struct ib_sig_domain *domain,
+  int *selector)
+{
+
+#define FORMAT_DIF_NONE0
+#define FORMAT_DIF_CRC_INC 4
+#define FORMAT_DIF_CSUM_INC12
+#define FORMAT_DIF_CRC_NO_INC  13
+#define FORMAT_DIF_CSUM_NO_INC 14
+
+   switch (domain-sig.dif.type) {
+   case IB_T10DIF_NONE:
+   /* No DIF */
+   *selector = FORMAT_DIF_NONE;
+   break;
+   case IB_T10DIF_TYPE1: /* Fall through */
+   case IB_T10DIF_TYPE2:
+   switch (domain-sig.dif.bg_type) {
+   case IB_T10DIF_CRC:
+   *selector = FORMAT_DIF_CRC_INC;
+   break;
+   case IB_T10DIF_CSUM:
+   *selector = FORMAT_DIF_CSUM_INC;
+   break;
+   default:
+   return 1;
+   }
+   break;
+   case IB_T10DIF_TYPE3:
+   switch (domain-sig.dif.bg_type) {
+   case IB_T10DIF_CRC:
+   *selector = domain-sig.dif.type3_inc_reftag ?
+  FORMAT_DIF_CRC_INC :
+  FORMAT_DIF_CRC_NO_INC;
+   break;
+   case IB_T10DIF_CSUM:
+   *selector = domain-sig.dif.type3_inc_reftag ?
+  FORMAT_DIF_CSUM_INC

[PATCH v4 06/10] IB/mlx5: remove MTT access mode from umr flags helper function

2013-12-16 Thread Sagi Grimberg
get_umr_flags helper function might be used for types
of access modes other than ACCESS_MODE_MTT, such as
ACCESS_MODE_KLM. so remove it from helper and caller
will add it's own access mode flag.

This commit does not add/change functionality.

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/hw/mlx5/qp.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index e135d71..1329c10 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -1775,7 +1775,7 @@ static u8 get_umr_flags(int acc)
   (acc  IB_ACCESS_REMOTE_WRITE  ? MLX5_PERM_REMOTE_WRITE : 0) |
   (acc  IB_ACCESS_REMOTE_READ   ? MLX5_PERM_REMOTE_READ  : 0) |
   (acc  IB_ACCESS_LOCAL_WRITE   ? MLX5_PERM_LOCAL_WRITE  : 0) |
-   MLX5_PERM_LOCAL_READ | MLX5_PERM_UMR_EN | MLX5_ACCESS_MODE_MTT;
+   MLX5_PERM_LOCAL_READ | MLX5_PERM_UMR_EN;
 }
 
 static void set_mkey_segment(struct mlx5_mkey_seg *seg, struct ib_send_wr *wr,
@@ -1787,7 +1787,8 @@ static void set_mkey_segment(struct mlx5_mkey_seg *seg, 
struct ib_send_wr *wr,
return;
}
 
-   seg-flags = get_umr_flags(wr-wr.fast_reg.access_flags);
+   seg-flags = get_umr_flags(wr-wr.fast_reg.access_flags) |
+MLX5_ACCESS_MODE_MTT;
*writ = seg-flags  (MLX5_PERM_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE);
seg-qpn_mkey7_0 = cpu_to_be32((wr-wr.fast_reg.rkey  0xff) | 
0xff00);
seg-flags_pd = cpu_to_be32(MLX5_MKEY_REMOTE_INVAL);
-- 
1.7.8.2

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Native IB connection setup.

2014-01-02 Thread Sagi Grimberg

On 1/2/2014 10:11 AM, Ilya Kalistru wrote:

Happy New Year, ladies and gentlemen!

I'm developing some sort of hardware InfiniBand server runing on
FPGA and delivering some data to PC using RDMA_WRITE operation.
I've already had Physical Link Up and Logical Link Up between my
device and PC with Mellanox HCA.
I see GUID and LID of my device when I runing ibstatus or
ibnetdiscover command on PC and therefore I think subnet configuration
is ok.

Now I have a problem with connection setup. Because I'm only who is
developing this device and it's a problem to add extra protocols in
FPGA firmware I don't want to use any something like getaddrinfo()
(they use IPoIB)...
I'm going to use native IB CM REQ/REP/RTU MADs for connection setup,
but I don't know how.

I think that I should request GUID to LID resolution at first. Like
rdma_resolve_addr()/rdma_resolve_route() but from GUID not from IP.
Second (I think) I should use ib_send_cm_req() and ib_send_cm_rtu()
with well known ServiceID (I select it) to establish connection.

I'm not a programmer and have no experience with programming of
network based applications and therefore I will be thankful very much
if you help me with example of programm code using native IB
connection setup technics or any other help.


You can  have a look in SRP (SCSI RDMA Protocol under 
drivers/infiniband/ulp/srp) as a reference for native IB connection 
establishment.



P.S. It's my first time I'm using mailing list. I'm sorry, if I'm
doing something wrong.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message tomajord...@vger.kernel.org
More majordomo info athttp://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux rdma 3.14 merge plans

2014-01-08 Thread Sagi Grimberg

On 1/8/2014 2:51 AM, Roland Dreier wrote:

On Tue, Jan 7, 2014 at 1:02 PM, Or Gerlitz or.gerl...@gmail.com wrote:


Currently there is single patch for 3.14 on your for-next branch, the
usnic driver. With 3.13 being on rc7 and likely to be released next
week, are you planning any other merges for 3.14? we have patches
waiting for weeks and months without any comment from you.

I am definitely planning on merging the new IBoE IP addressing stuff,
since we seem to have solved the ABI issues.

The UD flow steering patches seem good and I will take a closer look soon.

And there are quite a few usnic patches still to pick up.

I'm confident that will all make it.


The data integrity stuff I'm not so sure about.  Sean raised some I
think legitimate questions about whether all this should be added to
the verbs API and I want to see more discussion or at least have a
deep think about this myself before comitting.


Hey Roland,

I don't think that Sean didn't question weather data-integrity support 
should or shouldn't be added to Verbs API (Sean correct me if I'm 
wrong), but rather the way it should be added.
From our discussion on this, the only conflict that Sean and I had was 
weather the protection setup should ride on ib_post_send.
Sean suggested a separate routine that would post on the SQ. I think 
that in the current framework where
placing a fast-path operation is done via ib_post_send, we keep current 
implementation, and open a discussion
if it is a good idea to migrate non-send work-requests out of 
ib_post_send (also fast-registration and memory-windows).


Sagi.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 00/11] iSER target initial support for T10-DIF offload

2014-01-09 Thread Sagi Grimberg
Hey Nic, MKP, SCSI and RDMA folks,

This patchset adds basic support for T10-DIF protection information offload
in iSER target on top of Nic's recent work and RDMA signature verbs API.

This code was tested with my own implementation of the target core T10-PI 
support
which was designed mainly to activate the transport DIF offload. In order to
actually get Linux SCSI target to work with iSER T10-DIF offload a couple of
patches needs to be added to Nic's work which is ongoing.

Apart from doing the actual iser implementation for  T10-DIF offload, this
series would help to see the full picture by:

* Showing how the T10-DIF offload verbs are used
* Showing how fabric transport offload plugs into the target core

The T10-DIF signature offload verbs and mlx5 driver implementation patches are 
available
from the for-next branch of git://beany.openfabrics.org/~ogerlitz/linux-2.6.git
as the below commits:

2b4316b IB/mlx5: Publish support in signature feature
ef3130d IB/mlx5: Collect signature error completion
c1b37b1 IB/mlx5: Support IB_WR_REG_SIG_MR
f5d8496 IB/mlx5: Keep mlx5 MRs in a radix tree under device
72a72ee IB/mlx5: remove MTT access mode from umr flags helper function
ccb0a907 IB/mlx5: Break wqe handling to begin  finish routines
cda0569 IB/mlx5: Initialize mlx5_ib_qp signature related
33b4079 IB/mlx5, mlx5_core: Support for create_mr and destroy_mr
8b343e6 IB/core: Introduce Signature Verbs API
c1b0358 IB/core: Introduce protected memory regions

Sagi Grimberg (11):
  Target/core: Fixes for isert compilation
  IB/isert: seperate connection protection domains and dma MRs
  IB/isert: Avoid frwr notation, user fastreg
  IB/isert: Move fastreg descriptor creation to a function
  Target/iscsi: Add T10-PI indication for iscsi_portal_group
  IB/isert: Initialize T10-PI resources
  IB/isert: pass scatterlist instead of cmd to fast_reg_mr routine
  IB/isert: pass mr and frpl to isert_fast_reg_mr routine
  IB/isert: Accept RDMA_WRITE completions
  IB/isert: Support T10-PI protected transactions
  Target/configfs: Expose iSCSI network portal group T10-PI support

 drivers/infiniband/ulp/isert/ib_isert.c  |  708 +++--
 drivers/infiniband/ulp/isert/ib_isert.h  |   29 +-
 drivers/target/iscsi/iscsi_target_configfs.c |6 +
 drivers/target/iscsi/iscsi_target_core.h |5 +-
 drivers/target/iscsi/iscsi_target_tpg.c  |   21 +
 drivers/target/iscsi/iscsi_target_tpg.h  |1 +
 include/target/target_core_base.h|   22 +-
 7 files changed, 603 insertions(+), 189 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/11] IB/isert: Avoid frwr notation, user fastreg

2014-01-09 Thread Sagi Grimberg
Use fast registration lingo. fast registration will
also incorporate signature/DIF registration.

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/ulp/isert/ib_isert.c |   84 ---
 drivers/infiniband/ulp/isert/ib_isert.h |8 ++--
 2 files changed, 47 insertions(+), 45 deletions(-)

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c 
b/drivers/infiniband/ulp/isert/ib_isert.c
index 3dd2427..295d2be 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -47,10 +47,10 @@ static int
 isert_map_rdma(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
   struct isert_rdma_wr *wr);
 static void
-isert_unreg_rdma_frwr(struct isert_cmd *isert_cmd, struct isert_conn 
*isert_conn);
+isert_unreg_rdma(struct isert_cmd *isert_cmd, struct isert_conn *isert_conn);
 static int
-isert_reg_rdma_frwr(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
-   struct isert_rdma_wr *wr);
+isert_reg_rdma(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
+  struct isert_rdma_wr *wr);
 
 static void
 isert_qp_event_callback(struct ib_event *e, void *context)
@@ -225,11 +225,11 @@ isert_create_device_ib_res(struct isert_device *device)
 
/* asign function handlers */
if (dev_attr-device_cap_flags  IB_DEVICE_MEM_MGT_EXTENSIONS) {
-   device-use_frwr = 1;
-   device-reg_rdma_mem = isert_reg_rdma_frwr;
-   device-unreg_rdma_mem = isert_unreg_rdma_frwr;
+   device-use_fastreg = 1;
+   device-reg_rdma_mem = isert_reg_rdma;
+   device-unreg_rdma_mem = isert_unreg_rdma;
} else {
-   device-use_frwr = 0;
+   device-use_fastreg = 0;
device-reg_rdma_mem = isert_map_rdma;
device-unreg_rdma_mem = isert_unmap_cmd;
}
@@ -237,9 +237,10 @@ isert_create_device_ib_res(struct isert_device *device)
device-cqs_used = min_t(int, num_online_cpus(),
 device-ib_device-num_comp_vectors);
device-cqs_used = min(ISERT_MAX_CQ, device-cqs_used);
-   pr_debug(Using %d CQs, device %s supports %d vectors support FRWR 
%d\n,
+   pr_debug(Using %d CQs, device %s supports %d vectors support 
+Fast registration %d\n,
 device-cqs_used, device-ib_device-name,
-device-ib_device-num_comp_vectors, device-use_frwr);
+device-ib_device-num_comp_vectors, device-use_fastreg);
device-cq_desc = kzalloc(sizeof(struct isert_cq_desc) *
device-cqs_used, GFP_KERNEL);
if (!device-cq_desc) {
@@ -367,18 +368,18 @@ isert_device_find_by_ib_dev(struct rdma_cm_id *cma_id)
 }
 
 static void
-isert_conn_free_frwr_pool(struct isert_conn *isert_conn)
+isert_conn_free_fastreg_pool(struct isert_conn *isert_conn)
 {
struct fast_reg_descriptor *fr_desc, *tmp;
int i = 0;
 
-   if (list_empty(isert_conn-conn_frwr_pool))
+   if (list_empty(isert_conn-conn_fr_pool))
return;
 
-   pr_debug(Freeing conn %p frwr pool, isert_conn);
+   pr_debug(Freeing conn %p fastreg pool, isert_conn);
 
list_for_each_entry_safe(fr_desc, tmp,
-isert_conn-conn_frwr_pool, list) {
+isert_conn-conn_fr_pool, list) {
list_del(fr_desc-list);
ib_free_fast_reg_page_list(fr_desc-data_frpl);
ib_dereg_mr(fr_desc-data_mr);
@@ -386,20 +387,20 @@ isert_conn_free_frwr_pool(struct isert_conn *isert_conn)
++i;
}
 
-   if (i  isert_conn-conn_frwr_pool_size)
+   if (i  isert_conn-conn_fr_pool_size)
pr_warn(Pool still has %d regions registered\n,
-   isert_conn-conn_frwr_pool_size - i);
+   isert_conn-conn_fr_pool_size - i);
 }
 
 static int
-isert_conn_create_frwr_pool(struct isert_conn *isert_conn)
+isert_conn_create_fastreg_pool(struct isert_conn *isert_conn)
 {
struct fast_reg_descriptor *fr_desc;
struct isert_device *device = isert_conn-conn_device;
int i, ret;
 
-   INIT_LIST_HEAD(isert_conn-conn_frwr_pool);
-   isert_conn-conn_frwr_pool_size = 0;
+   INIT_LIST_HEAD(isert_conn-conn_fr_pool);
+   isert_conn-conn_fr_pool_size = 0;
for (i = 0; i  ISCSI_DEF_XMIT_CMDS_MAX; i++) {
fr_desc = kzalloc(sizeof(*fr_desc), GFP_KERNEL);
if (!fr_desc) {
@@ -431,17 +432,17 @@ isert_conn_create_frwr_pool(struct isert_conn *isert_conn)
 fr_desc, fr_desc-data_frpl-page_list);
 
fr_desc-valid = true;
-   list_add_tail(fr_desc-list, isert_conn-conn_frwr_pool);
-   isert_conn-conn_frwr_pool_size++;
+   list_add_tail(fr_desc-list, isert_conn-conn_fr_pool);
+   isert_conn-conn_fr_pool_size

[PATCH 07/11] IB/isert: pass scatterlist instead of cmd to fast_reg_mr routine

2014-01-09 Thread Sagi Grimberg
This routine may help for protection registration as well.
This patch does not change any functionality.

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/ulp/isert/ib_isert.c |   28 
 1 files changed, 12 insertions(+), 16 deletions(-)

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c 
b/drivers/infiniband/ulp/isert/ib_isert.c
index 98f23f4..3495e73 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -2247,26 +2247,22 @@ isert_map_fr_pagelist(struct ib_device *ib_dev,
 
 static int
 isert_fast_reg_mr(struct fast_reg_descriptor *fr_desc,
- struct isert_cmd *isert_cmd, struct isert_conn *isert_conn,
- struct ib_sge *ib_sge, u32 offset, unsigned int data_len)
+ struct isert_conn *isert_conn, struct scatterlist *sg_start,
+ struct ib_sge *ib_sge, u32 sg_nents, u32 offset,
+ unsigned int data_len)
 {
-   struct iscsi_cmd *cmd = isert_cmd-iscsi_cmd;
struct ib_device *ib_dev = isert_conn-conn_cm_id-device;
-   struct scatterlist *sg_start;
-   u32 sg_off, page_off;
struct ib_send_wr fr_wr, inv_wr;
struct ib_send_wr *bad_wr, *wr = NULL;
+   int ret, pagelist_len;
+   u32 page_off;
u8 key;
-   int ret, sg_nents, pagelist_len;
 
-   sg_off = offset / PAGE_SIZE;
-   sg_start = cmd-se_cmd.t_data_sg[sg_off];
-   sg_nents = min_t(unsigned int, cmd-se_cmd.t_data_nents - sg_off,
-ISCSI_ISER_SG_TABLESIZE);
+   sg_nents = min_t(unsigned int, sg_nents, ISCSI_ISER_SG_TABLESIZE);
page_off = offset % PAGE_SIZE;
 
-   pr_debug(Cmd: %p use fr_desc %p sg_nents %d sg_off %d offset %u\n,
-isert_cmd, fr_desc, sg_nents, sg_off, offset);
+   pr_debug(Use fr_desc %p sg_nents %d offset %u\n,
+fr_desc, sg_nents, offset);
 
pagelist_len = isert_map_fr_pagelist(ib_dev, sg_start, sg_nents,
 fr_desc-data_frpl-page_list[0]);
@@ -2335,9 +2331,9 @@ isert_reg_rdma(struct iscsi_conn *conn, struct iscsi_cmd 
*cmd,
if (wr-iser_ib_op == ISER_IB_RDMA_WRITE) {
data_left = se_cmd-data_length;
} else {
-   sg_off = cmd-write_data_done / PAGE_SIZE;
-   data_left = se_cmd-data_length - cmd-write_data_done;
offset = cmd-write_data_done;
+   sg_off = offset / PAGE_SIZE;
+   data_left = se_cmd-data_length - cmd-write_data_done;
isert_cmd-tx_desc.isert_cmd = isert_cmd;
}
 
@@ -2401,8 +2397,8 @@ isert_reg_rdma(struct iscsi_conn *conn, struct iscsi_cmd 
*cmd,
spin_unlock_irqrestore(isert_conn-conn_lock, flags);
wr-fr_desc = fr_desc;
 
-   ret = isert_fast_reg_mr(fr_desc, isert_cmd, isert_conn,
- ib_sge, offset, data_len);
+   ret = isert_fast_reg_mr(fr_desc, isert_conn, sg_start,
+   ib_sge, sg_nents, offset, data_len);
if (ret) {
list_add_tail(fr_desc-list, 
isert_conn-conn_fr_pool);
goto unmap_sg;
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/11] IB/isert: seperate connection protection domains and dma MRs

2014-01-09 Thread Sagi Grimberg
It is more correct to seperate connections protection domains
and dma_mr handles. protection information support requires to
do so.

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/ulp/isert/ib_isert.c |   46 ---
 drivers/infiniband/ulp/isert/ib_isert.h |2 -
 2 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c 
b/drivers/infiniband/ulp/isert/ib_isert.c
index 6be57c3..3dd2427 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -248,13 +248,6 @@ isert_create_device_ib_res(struct isert_device *device)
}
cq_desc = device-cq_desc;
 
-   device-dev_pd = ib_alloc_pd(ib_dev);
-   if (IS_ERR(device-dev_pd)) {
-   ret = PTR_ERR(device-dev_pd);
-   pr_err(ib_alloc_pd failed for dev_pd: %d\n, ret);
-   goto out_cq_desc;
-   }
-
for (i = 0; i  device-cqs_used; i++) {
cq_desc[i].device = device;
cq_desc[i].cq_index = i;
@@ -282,13 +275,6 @@ isert_create_device_ib_res(struct isert_device *device)
goto out_cq;
}
 
-   device-dev_mr = ib_get_dma_mr(device-dev_pd, IB_ACCESS_LOCAL_WRITE);
-   if (IS_ERR(device-dev_mr)) {
-   ret = PTR_ERR(device-dev_mr);
-   pr_err(ib_get_dma_mr failed for dev_mr: %d\n, ret);
-   goto out_cq;
-   }
-
return 0;
 
 out_cq:
@@ -304,9 +290,6 @@ out_cq:
ib_destroy_cq(device-dev_tx_cq[j]);
}
}
-   ib_dealloc_pd(device-dev_pd);
-
-out_cq_desc:
kfree(device-cq_desc);
 
return ret;
@@ -329,8 +312,6 @@ isert_free_device_ib_res(struct isert_device *device)
device-dev_tx_cq[i] = NULL;
}
 
-   ib_dereg_mr(device-dev_mr);
-   ib_dealloc_pd(device-dev_pd);
kfree(device-cq_desc);
 }
 
@@ -437,7 +418,7 @@ isert_conn_create_frwr_pool(struct isert_conn *isert_conn)
goto err;
}
 
-   fr_desc-data_mr = ib_alloc_fast_reg_mr(device-dev_pd,
+   fr_desc-data_mr = ib_alloc_fast_reg_mr(isert_conn-conn_pd,
ISCSI_ISER_SG_TABLESIZE);
if (IS_ERR(fr_desc-data_mr)) {
pr_err(Failed to allocate frmr err=%ld\n,
@@ -546,8 +527,22 @@ isert_connect_request(struct rdma_cm_id *cma_id, struct 
rdma_cm_event *event)
}
 
isert_conn-conn_device = device;
-   isert_conn-conn_pd = device-dev_pd;
-   isert_conn-conn_mr = device-dev_mr;
+   isert_conn-conn_pd = ib_alloc_pd(isert_conn-conn_device-ib_device);
+   if (IS_ERR(isert_conn-conn_pd)) {
+   ret = PTR_ERR(isert_conn-conn_pd);
+   pr_err(ib_alloc_pd failed for conn %p: ret=%d\n,
+  isert_conn, ret);
+   goto out_pd;
+   }
+
+   isert_conn-conn_mr = ib_get_dma_mr(isert_conn-conn_pd,
+  IB_ACCESS_LOCAL_WRITE);
+   if (IS_ERR(isert_conn-conn_mr)) {
+   ret = PTR_ERR(isert_conn-conn_mr);
+   pr_err(ib_get_dma_mr failed for conn %p: ret=%d\n,
+  isert_conn, ret);
+   goto out_mr;
+   }
 
if (device-use_frwr) {
ret = isert_conn_create_frwr_pool(isert_conn);
@@ -573,6 +568,10 @@ out_conn_dev:
if (device-use_frwr)
isert_conn_free_frwr_pool(isert_conn);
 out_frwr:
+   ib_dereg_mr(isert_conn-conn_mr);
+out_mr:
+   ib_dealloc_pd(isert_conn-conn_pd);
+out_pd:
isert_device_try_release(device);
 out_rsp_dma_map:
ib_dma_unmap_single(ib_dev, isert_conn-login_rsp_dma,
@@ -611,6 +610,9 @@ isert_connect_release(struct isert_conn *isert_conn)
isert_free_rx_descriptors(isert_conn);
rdma_destroy_id(isert_conn-conn_cm_id);
 
+   ib_dereg_mr(isert_conn-conn_mr);
+   ib_dealloc_pd(isert_conn-conn_pd);
+
if (isert_conn-login_buf) {
ib_dma_unmap_single(ib_dev, isert_conn-login_rsp_dma,
ISER_RX_LOGIN_SIZE, DMA_TO_DEVICE);
diff --git a/drivers/infiniband/ulp/isert/ib_isert.h 
b/drivers/infiniband/ulp/isert/ib_isert.h
index 691f90f..dec74d4 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.h
+++ b/drivers/infiniband/ulp/isert/ib_isert.h
@@ -144,8 +144,6 @@ struct isert_device {
int refcount;
int cq_active_qps[ISERT_MAX_CQ];
struct ib_device*ib_device;
-   struct ib_pd*dev_pd;
-   struct ib_mr*dev_mr;
struct ib_cq*dev_rx_cq[ISERT_MAX_CQ];
struct ib_cq*dev_tx_cq[ISERT_MAX_CQ];
struct isert_cq_desc*cq_desc;
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message

[PATCH 01/11] Target/core: Fixes for isert compilation

2014-01-09 Thread Sagi Grimberg
replace prot_interleaved with prot_handover in se_cmd.

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 include/target/target_core_base.h |   22 ++
 1 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/include/target/target_core_base.h 
b/include/target/target_core_base.h
index 13daea5..2ae304d 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -439,14 +439,20 @@ struct se_tmr_req {
struct list_headtmr_list;
 };
 
+#define TARGET_DIF_SIZE 8
 enum target_prot_op {
-   TARGET_PROT_NORMAL,
-   TARGET_PROT_READ_INSERT,
-   TARGET_PROT_WRITE_INSERT,
-   TARGET_PROT_READ_STRIP,
-   TARGET_PROT_WRITE_STRIP,
-   TARGET_PROT_READ_PASS,
-   TARGET_PROT_WRITE_PASS,
+   TARGET_PROT_NORMAL = 0,
+   TARGET_PROT_DIN_INSERT,
+   TARGET_PROT_DOUT_INSERT,
+   TARGET_PROT_DIN_STRIP,
+   TARGET_PROT_DOUT_STRIP,
+   TARGET_PROT_DIN_PASS,
+   TARGET_PROT_DOUT_PASS
+};
+
+enum target_prot_ho {
+   PROT_SEPERATED,
+   PROT_INTERLEAVED,
 };
 
 enum target_prot_type {
@@ -573,7 +579,7 @@ struct se_cmd {
u32 prot_length;
struct scatterlist  *t_prot_sg;
unsigned intt_prot_nents;
-   boolprot_interleaved;
+   enum target_prot_ho prot_handover;
enum target_pi_errorpi_err;
u32 block_num;
 };
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 08/11] IB/isert: pass mr and frpl to isert_fast_reg_mr routine

2014-01-09 Thread Sagi Grimberg
This commit generalizes isert_fast_reg_mr to receive mr
and frpl instead of fr_desc to do registration. In T10-PI
we also register protection memory region so we want to
use this routine.

This commit does not change any functionality.

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/ulp/isert/ib_isert.c |   62 +++
 1 files changed, 30 insertions(+), 32 deletions(-)

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c 
b/drivers/infiniband/ulp/isert/ib_isert.c
index 3495e73..98aab21 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -2246,10 +2246,10 @@ isert_map_fr_pagelist(struct ib_device *ib_dev,
 }
 
 static int
-isert_fast_reg_mr(struct fast_reg_descriptor *fr_desc,
- struct isert_conn *isert_conn, struct scatterlist *sg_start,
- struct ib_sge *ib_sge, u32 sg_nents, u32 offset,
- unsigned int data_len)
+isert_fast_reg_mr(struct isert_conn *isert_conn, struct ib_mr *mr,
+ struct ib_fast_reg_page_list *frpl, bool *key_valid,
+ struct scatterlist *sg_start, u32 sg_nents, u32 offset,
+ unsigned int data_len, struct ib_sge *ib_sge)
 {
struct ib_device *ib_dev = isert_conn-conn_cm_id-device;
struct ib_send_wr fr_wr, inv_wr;
@@ -2260,33 +2260,31 @@ isert_fast_reg_mr(struct fast_reg_descriptor *fr_desc,
 
sg_nents = min_t(unsigned int, sg_nents, ISCSI_ISER_SG_TABLESIZE);
page_off = offset % PAGE_SIZE;
-
-   pr_debug(Use fr_desc %p sg_nents %d offset %u\n,
-fr_desc, sg_nents, offset);
+   pr_debug(Use mr %p frpl %p sg_nents %d offset %u\n,
+mr, frpl, sg_nents, offset);
 
pagelist_len = isert_map_fr_pagelist(ib_dev, sg_start, sg_nents,
-fr_desc-data_frpl-page_list[0]);
+frpl-page_list[0]);
 
-   if (!fr_desc-data_key_valid) {
+   if (!*key_valid) {
memset(inv_wr, 0, sizeof(inv_wr));
inv_wr.opcode = IB_WR_LOCAL_INV;
-   inv_wr.ex.invalidate_rkey = fr_desc-data_mr-rkey;
+   inv_wr.ex.invalidate_rkey = mr-rkey;
wr = inv_wr;
/* Bump the key */
-   key = (u8)(fr_desc-data_mr-rkey  0x00FF);
-   ib_update_fast_reg_key(fr_desc-data_mr, ++key);
+   key = (u8)(mr-rkey  0x00FF);
+   ib_update_fast_reg_key(mr, ++key);
}
 
/* Prepare FASTREG WR */
memset(fr_wr, 0, sizeof(fr_wr));
fr_wr.opcode = IB_WR_FAST_REG_MR;
-   fr_wr.wr.fast_reg.iova_start =
-   fr_desc-data_frpl-page_list[0] + page_off;
-   fr_wr.wr.fast_reg.page_list = fr_desc-data_frpl;
+   fr_wr.wr.fast_reg.iova_start = frpl-page_list[0] + page_off;
+   fr_wr.wr.fast_reg.page_list = frpl;
fr_wr.wr.fast_reg.page_list_len = pagelist_len;
fr_wr.wr.fast_reg.page_shift = PAGE_SHIFT;
fr_wr.wr.fast_reg.length = data_len;
-   fr_wr.wr.fast_reg.rkey = fr_desc-data_mr-rkey;
+   fr_wr.wr.fast_reg.rkey = mr-rkey;
fr_wr.wr.fast_reg.access_flags = IB_ACCESS_LOCAL_WRITE;
 
if (!wr)
@@ -2299,14 +2297,14 @@ isert_fast_reg_mr(struct fast_reg_descriptor *fr_desc,
pr_err(fast registration failed, ret:%d\n, ret);
return ret;
}
-   fr_desc-data_key_valid = false;
 
-   ib_sge-lkey = fr_desc-data_mr-lkey;
-   ib_sge-addr = fr_desc-data_frpl-page_list[0] + page_off;
+   *key_valid = false;
+   ib_sge-lkey = mr-lkey;
+   ib_sge-addr = frpl-page_list[0] + page_off;
ib_sge-length = data_len;
 
-   pr_debug(RDMA ib_sge: addr: 0x%16llx  length: %u lkey: %08x\n,
-ib_sge-addr, ib_sge-length, ib_sge-lkey);
+   pr_debug(fastreg ib_sge: addr: 0x%16llx  length: %u lkey: %08x\n,
+ib_sge-addr + page_off, ib_sge-length, ib_sge-lkey);
 
return ret;
 }
@@ -2320,7 +2318,7 @@ isert_reg_rdma(struct iscsi_conn *conn, struct iscsi_cmd 
*cmd,
struct isert_conn *isert_conn = (struct isert_conn *)conn-context;
struct ib_device *ib_dev = isert_conn-conn_cm_id-device;
struct ib_send_wr *send_wr;
-   struct ib_sge *ib_sge;
+   struct ib_sge data_sge;
struct scatterlist *sg_start;
struct fast_reg_descriptor *fr_desc;
u32 sg_off = 0, sg_nents;
@@ -2352,10 +2350,7 @@ isert_reg_rdma(struct iscsi_conn *conn, struct iscsi_cmd 
*cmd,
pr_debug(Mapped cmd: %p count: %u sg: %p sg_nents: %u rdma_len %d\n,
 isert_cmd, count, sg_start, sg_nents, data_left);
 
-   memset(wr-s_ib_sge, 0, sizeof(*ib_sge));
-   ib_sge = wr-s_ib_sge;
-   wr-ib_sge = ib_sge;
-
+   wr-ib_sge = wr-s_ib_sge;
wr-send_wr_num = 1;
memset(wr-s_send_wr, 0, sizeof(*send_wr));
wr-send_wr = wr-s_send_wr

[PATCH 06/11] IB/isert: Initialize T10-PI resources

2014-01-09 Thread Sagi Grimberg
Upon connection establishment check if network portal is T10-PI
enabled and allocate T10-PI resources if necessary, allocate
signature enabled memory regions and mark connection queue-pair
as signature enabled.

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/ulp/isert/ib_isert.c |  104 +++
 drivers/infiniband/ulp/isert/ib_isert.h |   19 +-
 2 files changed, 106 insertions(+), 17 deletions(-)

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c 
b/drivers/infiniband/ulp/isert/ib_isert.c
index 9ef9193..98f23f4 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -87,7 +87,8 @@ isert_query_device(struct ib_device *ib_dev, struct 
ib_device_attr *devattr)
 }
 
 static int
-isert_conn_setup_qp(struct isert_conn *isert_conn, struct rdma_cm_id *cma_id)
+isert_conn_setup_qp(struct isert_conn *isert_conn, struct rdma_cm_id *cma_id,
+   u8 protection)
 {
struct isert_device *device = isert_conn-conn_device;
struct ib_qp_init_attr attr;
@@ -119,6 +120,8 @@ isert_conn_setup_qp(struct isert_conn *isert_conn, struct 
rdma_cm_id *cma_id)
attr.cap.max_recv_sge = 1;
attr.sq_sig_type = IB_SIGNAL_REQ_WR;
attr.qp_type = IB_QPT_RC;
+   if (protection)
+   attr.create_flags |= IB_QP_CREATE_SIGNATURE_EN;
 
pr_debug(isert_conn_setup_qp cma_id-device: %p\n,
 cma_id-device);
@@ -234,13 +237,18 @@ isert_create_device_ib_res(struct isert_device *device)
device-unreg_rdma_mem = isert_unmap_cmd;
}
 
+   /* Check signature cap */
+   device-pi_capable = dev_attr-device_cap_flags 
+IB_DEVICE_SIGNATURE_HANDOVER ? true : false;
+
device-cqs_used = min_t(int, num_online_cpus(),
 device-ib_device-num_comp_vectors);
device-cqs_used = min(ISERT_MAX_CQ, device-cqs_used);
pr_debug(Using %d CQs, device %s supports %d vectors support 
-Fast registration %d\n,
+Fast registration %d pi_capable %d\n,
 device-cqs_used, device-ib_device-name,
-device-ib_device-num_comp_vectors, device-use_fastreg);
+device-ib_device-num_comp_vectors, device-use_fastreg,
+device-pi_capable);
device-cq_desc = kzalloc(sizeof(struct isert_cq_desc) *
device-cqs_used, GFP_KERNEL);
if (!device-cq_desc) {
@@ -383,6 +391,12 @@ isert_conn_free_fastreg_pool(struct isert_conn *isert_conn)
list_del(fr_desc-list);
ib_free_fast_reg_page_list(fr_desc-data_frpl);
ib_dereg_mr(fr_desc-data_mr);
+   if (fr_desc-pi_ctx) {
+   ib_free_fast_reg_page_list(fr_desc-pi_ctx-prot_frpl);
+   ib_dereg_mr(fr_desc-pi_ctx-prot_mr);
+   ib_destroy_mr(fr_desc-pi_ctx-sig_mr);
+   kfree(fr_desc-pi_ctx);
+   }
kfree(fr_desc);
++i;
}
@@ -394,8 +408,10 @@ isert_conn_free_fastreg_pool(struct isert_conn *isert_conn)
 
 static int
 isert_create_fr_desc(struct ib_device *ib_device, struct ib_pd *pd,
-struct fast_reg_descriptor *fr_desc)
+struct fast_reg_descriptor *fr_desc, u8 protection)
 {
+   int ret;
+
fr_desc-data_frpl = ib_alloc_fast_reg_page_list(ib_device,
 
ISCSI_ISER_SG_TABLESIZE);
if (IS_ERR(fr_desc-data_frpl)) {
@@ -408,19 +424,73 @@ isert_create_fr_desc(struct ib_device *ib_device, struct 
ib_pd *pd,
if (IS_ERR(fr_desc-data_mr)) {
pr_err(Failed to allocate data frmr err=%ld\n,
   PTR_ERR(fr_desc-data_mr));
-   ib_free_fast_reg_page_list(fr_desc-data_frpl);
-   return PTR_ERR(fr_desc-data_mr);
+   ret = PTR_ERR(fr_desc-data_mr);
+   goto err_data_frpl;
}
pr_debug(Create fr_desc %p page_list %p\n,
 fr_desc, fr_desc-data_frpl-page_list);
+   fr_desc-data_key_valid = true;
 
-   fr_desc-valid = true;
+   if (protection) {
+   struct ib_mr_init_attr mr_init_attr = {0};
+   struct pi_context *pi_ctx;
+
+   fr_desc-pi_ctx = kzalloc(sizeof(*fr_desc-pi_ctx), GFP_KERNEL);
+   if (!fr_desc-pi_ctx) {
+   pr_err(Failed to allocate pi context\n);
+   ret = -ENOMEM;
+   goto err_data_mr;
+   }
+   pi_ctx = fr_desc-pi_ctx;
+
+   pi_ctx-prot_frpl = ib_alloc_fast_reg_page_list(ib_device,
+   ISCSI_ISER_SG_TABLESIZE);
+   if (IS_ERR(pi_ctx-prot_frpl)) {
+   pr_err(Failed to allocate prot frpl err=%ld\n

[PATCH 09/11] IB/isert: Accept RDMA_WRITE completions

2014-01-09 Thread Sagi Grimberg
In case of protected transactions, we will need to check the
protection status of the transaction before sending SCSI response.
So be ready for RDMA_WRITE completions. currently we don't ask
for these completions, but for T10-PI we will.

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/ulp/isert/ib_isert.c |   20 +---
 1 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c 
b/drivers/infiniband/ulp/isert/ib_isert.c
index 98aab21..9aa933e 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -51,6 +51,8 @@ isert_unreg_rdma(struct isert_cmd *isert_cmd, struct 
isert_conn *isert_conn);
 static int
 isert_reg_rdma(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
   struct isert_rdma_wr *wr);
+static int
+isert_put_response(struct iscsi_conn *conn, struct iscsi_cmd *cmd);
 
 static void
 isert_qp_event_callback(struct ib_event *e, void *context)
@@ -1602,6 +1604,18 @@ isert_completion_put(struct iser_tx_desc *tx_desc, 
struct isert_cmd *isert_cmd,
 }
 
 static void
+isert_completion_rdma_write(struct iser_tx_desc *tx_desc,
+   struct isert_cmd *isert_cmd)
+{
+   struct iscsi_cmd *cmd = isert_cmd-iscsi_cmd;
+   struct isert_conn *isert_conn = isert_cmd-conn;
+   struct isert_device *device = isert_conn-conn_device;
+
+   device-unreg_rdma_mem(isert_cmd, isert_conn);
+   isert_put_response(isert_conn-conn, cmd);
+}
+
+static void
 isert_completion_rdma_read(struct iser_tx_desc *tx_desc,
   struct isert_cmd *isert_cmd)
 {
@@ -1721,9 +1735,9 @@ __isert_send_completion(struct iser_tx_desc *tx_desc,
  isert_conn, ib_dev);
break;
case ISER_IB_RDMA_WRITE:
-   pr_err(isert_send_completion: Got ISER_IB_RDMA_WRITE\n);
-   dump_stack();
-   break;
+   pr_debug(isert_send_completion: Got ISER_IB_RDMA_WRITE\n);
+   atomic_dec(isert_conn-post_send_buf_count);
+   isert_completion_rdma_write(tx_desc, isert_cmd);
case ISER_IB_RDMA_READ:
pr_debug(isert_send_completion: Got ISER_IB_RDMA_READ:\n);
 
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 05/11] Target/iscsi: Add T10-PI indication for iscsi_portal_group

2014-01-09 Thread Sagi Grimberg
In case an iscsi portal group will be defined as t10_pi enabled,
all connections on top of it will support protected transactions.

T10-PI support may require extra reource allocation and maintenance by
the transport layer, so we don't want to apply them on non-t10_pi network
portals. This is a hook for the iscsi target layer to signal the transport
at connection establishment that this connection will carry protected
transactions.

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/target/iscsi/iscsi_target_core.h |5 -
 drivers/target/iscsi/iscsi_target_tpg.c  |2 ++
 2 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/drivers/target/iscsi/iscsi_target_core.h 
b/drivers/target/iscsi/iscsi_target_core.h
index 48f7b3b..886d74d 100644
--- a/drivers/target/iscsi/iscsi_target_core.h
+++ b/drivers/target/iscsi/iscsi_target_core.h
@@ -58,7 +58,8 @@
 #define TA_DEMO_MODE_DISCOVERY 1
 #define TA_DEFAULT_ERL 0
 #define TA_CACHE_CORE_NPS  0
-
+/* T10 protection information disabled by default */
+#define TA_DEFAULT_T10_PI  0
 
 #define ISCSI_IOV_DATA_BUFFER  5
 
@@ -765,6 +766,7 @@ struct iscsi_tpg_attrib {
u32 prod_mode_write_protect;
u32 demo_mode_discovery;
u32 default_erl;
+   u8  t10_pi;
struct iscsi_portal_group *tpg;
 };
 
@@ -787,6 +789,7 @@ struct iscsi_np {
void*np_context;
struct iscsit_transport *np_transport;
struct list_headnp_list;
+   struct iscsi_tpg_np *tpg_np;
 } cacheline_aligned;
 
 struct iscsi_tpg_np {
diff --git a/drivers/target/iscsi/iscsi_target_tpg.c 
b/drivers/target/iscsi/iscsi_target_tpg.c
index 3976183..80ae14c 100644
--- a/drivers/target/iscsi/iscsi_target_tpg.c
+++ b/drivers/target/iscsi/iscsi_target_tpg.c
@@ -225,6 +225,7 @@ static void iscsit_set_default_tpg_attribs(struct 
iscsi_portal_group *tpg)
a-prod_mode_write_protect = TA_PROD_MODE_WRITE_PROTECT;
a-demo_mode_discovery = TA_DEMO_MODE_DISCOVERY;
a-default_erl = TA_DEFAULT_ERL;
+   a-t10_pi = TA_DEFAULT_T10_PI;
 }
 
 int iscsit_tpg_add_portal_group(struct iscsi_tiqn *tiqn, struct 
iscsi_portal_group *tpg)
@@ -500,6 +501,7 @@ struct iscsi_tpg_np *iscsit_tpg_add_network_portal(
init_completion(tpg_np-tpg_np_comp);
kref_init(tpg_np-tpg_np_kref);
tpg_np-tpg_np  = np;
+   np-tpg_np  = tpg_np;
tpg_np-tpg = tpg;
 
spin_lock(tpg-tpg_np_lock);
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/11] Target/configfs: Expose iSCSI network portal group T10-PI support

2014-01-09 Thread Sagi Grimberg
User may enable T10-PI support per network portal group. any connection
established on top of it, will be required to serve protected transactions.

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/target/iscsi/iscsi_target_configfs.c |6 ++
 drivers/target/iscsi/iscsi_target_tpg.c  |   19 +++
 drivers/target/iscsi/iscsi_target_tpg.h  |1 +
 3 files changed, 26 insertions(+), 0 deletions(-)

diff --git a/drivers/target/iscsi/iscsi_target_configfs.c 
b/drivers/target/iscsi/iscsi_target_configfs.c
index e3318ed..8f3f585 100644
--- a/drivers/target/iscsi/iscsi_target_configfs.c
+++ b/drivers/target/iscsi/iscsi_target_configfs.c
@@ -1051,6 +1051,11 @@ TPG_ATTR(demo_mode_discovery, S_IRUGO | S_IWUSR);
  */
 DEF_TPG_ATTRIB(default_erl);
 TPG_ATTR(default_erl, S_IRUGO | S_IWUSR);
+/*
+ * Define iscsi_tpg_attrib_s_t10_pi
+ */
+DEF_TPG_ATTRIB(t10_pi);
+TPG_ATTR(t10_pi, S_IRUGO | S_IWUSR);
 
 static struct configfs_attribute *lio_target_tpg_attrib_attrs[] = {
iscsi_tpg_attrib_authentication.attr,
@@ -1063,6 +1068,7 @@ static struct configfs_attribute 
*lio_target_tpg_attrib_attrs[] = {
iscsi_tpg_attrib_prod_mode_write_protect.attr,
iscsi_tpg_attrib_demo_mode_discovery.attr,
iscsi_tpg_attrib_default_erl.attr,
+   iscsi_tpg_attrib_t10_pi.attr,
NULL,
 };
 
diff --git a/drivers/target/iscsi/iscsi_target_tpg.c 
b/drivers/target/iscsi/iscsi_target_tpg.c
index 80ae14c..d95a5f2 100644
--- a/drivers/target/iscsi/iscsi_target_tpg.c
+++ b/drivers/target/iscsi/iscsi_target_tpg.c
@@ -860,3 +860,22 @@ int iscsit_ta_default_erl(
 
return 0;
 }
+
+int iscsit_ta_t10_pi(
+   struct iscsi_portal_group *tpg,
+   u32 flag)
+{
+   struct iscsi_tpg_attrib *a = tpg-tpg_attrib;
+
+   if ((flag != 0)  (flag != 1)) {
+   pr_err(Illegal value %d\n, flag);
+   return -EINVAL;
+   }
+
+   a-t10_pi = flag;
+   pr_debug(iSCSI_TPG[%hu] - T10 Protection information bit:
+%s\n, tpg-tpgt, (a-t10_pi) ?
+   ON : OFF);
+
+   return 0;
+}
diff --git a/drivers/target/iscsi/iscsi_target_tpg.h 
b/drivers/target/iscsi/iscsi_target_tpg.h
index 213c0fc..0a182f2 100644
--- a/drivers/target/iscsi/iscsi_target_tpg.h
+++ b/drivers/target/iscsi/iscsi_target_tpg.h
@@ -39,5 +39,6 @@ extern int iscsit_ta_demo_mode_write_protect(struct 
iscsi_portal_group *, u32);
 extern int iscsit_ta_prod_mode_write_protect(struct iscsi_portal_group *, u32);
 extern int iscsit_ta_demo_mode_discovery(struct iscsi_portal_group *, u32);
 extern int iscsit_ta_default_erl(struct iscsi_portal_group *, u32);
+extern int iscsit_ta_t10_pi(struct iscsi_portal_group *, u32);
 
 #endif /* ISCSI_TARGET_TPG_H */
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/11] IB/isert: Move fastreg descriptor creation to a function

2014-01-09 Thread Sagi Grimberg
This routine may be called both by fast registration
descriptors for data and for integrity buffers.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/ulp/isert/ib_isert.c |   52 +++
 1 files changed, 32 insertions(+), 20 deletions(-)

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c 
b/drivers/infiniband/ulp/isert/ib_isert.c
index 295d2be..9ef9193 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -393,6 +393,33 @@ isert_conn_free_fastreg_pool(struct isert_conn *isert_conn)
 }
 
 static int
+isert_create_fr_desc(struct ib_device *ib_device, struct ib_pd *pd,
+struct fast_reg_descriptor *fr_desc)
+{
+   fr_desc-data_frpl = ib_alloc_fast_reg_page_list(ib_device,
+
ISCSI_ISER_SG_TABLESIZE);
+   if (IS_ERR(fr_desc-data_frpl)) {
+   pr_err(Failed to allocate data frpl err=%ld\n,
+  PTR_ERR(fr_desc-data_frpl));
+   return PTR_ERR(fr_desc-data_frpl);
+   }
+
+   fr_desc-data_mr = ib_alloc_fast_reg_mr(pd, ISCSI_ISER_SG_TABLESIZE);
+   if (IS_ERR(fr_desc-data_mr)) {
+   pr_err(Failed to allocate data frmr err=%ld\n,
+  PTR_ERR(fr_desc-data_mr));
+   ib_free_fast_reg_page_list(fr_desc-data_frpl);
+   return PTR_ERR(fr_desc-data_mr);
+   }
+   pr_debug(Create fr_desc %p page_list %p\n,
+fr_desc, fr_desc-data_frpl-page_list);
+
+   fr_desc-valid = true;
+
+   return 0;
+}
+
+static int
 isert_conn_create_fastreg_pool(struct isert_conn *isert_conn)
 {
struct fast_reg_descriptor *fr_desc;
@@ -409,29 +436,14 @@ isert_conn_create_fastreg_pool(struct isert_conn 
*isert_conn)
goto err;
}
 
-   fr_desc-data_frpl =
-   ib_alloc_fast_reg_page_list(device-ib_device,
-   ISCSI_ISER_SG_TABLESIZE);
-   if (IS_ERR(fr_desc-data_frpl)) {
-   pr_err(Failed to allocate fr_pg_list err=%ld\n,
-  PTR_ERR(fr_desc-data_frpl));
-   ret = PTR_ERR(fr_desc-data_frpl);
-   goto err;
-   }
-
-   fr_desc-data_mr = ib_alloc_fast_reg_mr(isert_conn-conn_pd,
-   ISCSI_ISER_SG_TABLESIZE);
-   if (IS_ERR(fr_desc-data_mr)) {
-   pr_err(Failed to allocate frmr err=%ld\n,
-  PTR_ERR(fr_desc-data_mr));
-   ret = PTR_ERR(fr_desc-data_mr);
-   ib_free_fast_reg_page_list(fr_desc-data_frpl);
+   ret = isert_create_fr_desc(device-ib_device,
+  isert_conn-conn_pd, fr_desc);
+   if (ret) {
+   pr_err(Failed to create fastreg descriptor err=%d\n,
+  ret);
goto err;
}
-   pr_debug(Create fr_desc %p page_list %p\n,
-fr_desc, fr_desc-data_frpl-page_list);
 
-   fr_desc-valid = true;
list_add_tail(fr_desc-list, isert_conn-conn_fr_pool);
isert_conn-conn_fr_pool_size++;
}
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/11] IB/isert: Support T10-PI protected transactions

2014-01-09 Thread Sagi Grimberg
In case the Target core passed transport T10 protection
operation:

1. Register data buffer (data memory region)
2. Register protection buffer if exsists (prot memory region)
3. Register signature region (signature memory region)
   - use work request IB_WR_REG_SIG_MR
4. Execute RDMA
5. Upon RDMA completion check the signature status
   - if succeeded send good SCSI response
   - if failed send SCSI bad response with appropriate sense buffer

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/ulp/isert/ib_isert.c |  376 ++-
 1 files changed, 321 insertions(+), 55 deletions(-)

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c 
b/drivers/infiniband/ulp/isert/ib_isert.c
index 9aa933e..8a888f0 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -1499,6 +1499,7 @@ isert_unreg_rdma(struct isert_cmd *isert_cmd, struct 
isert_conn *isert_conn)
if (wr-fr_desc) {
pr_debug(unreg_fastreg_cmd: %p free fr_desc %p\n,
 isert_cmd, wr-fr_desc);
+   wr-fr_desc-protected = false;
spin_lock_bh(isert_conn-conn_lock);
list_add_tail(wr-fr_desc-list, isert_conn-conn_fr_pool);
spin_unlock_bh(isert_conn-conn_lock);
@@ -1604,13 +1605,65 @@ isert_completion_put(struct iser_tx_desc *tx_desc, 
struct isert_cmd *isert_cmd,
 }
 
 static void
+isert_pi_err_sense_buffer(u8 *buf, u8 key, u8 asc, u8 ascq)
+{
+   buf[0] = 0x70;
+   buf[SPC_SENSE_KEY_OFFSET] = key;
+   buf[SPC_ASC_KEY_OFFSET] = asc;
+   buf[SPC_ASCQ_KEY_OFFSET] = ascq;
+}
+
+static void
 isert_completion_rdma_write(struct iser_tx_desc *tx_desc,
struct isert_cmd *isert_cmd)
 {
+   struct isert_rdma_wr *wr = isert_cmd-rdma_wr;
struct iscsi_cmd *cmd = isert_cmd-iscsi_cmd;
+   struct se_cmd *se_cmd = cmd-se_cmd;
struct isert_conn *isert_conn = isert_cmd-conn;
struct isert_device *device = isert_conn-conn_device;
+   struct ib_mr_status mr_status;
+   int ret;
 
+   if (wr-fr_desc  wr-fr_desc-protected) {
+   ret = ib_check_mr_status(wr-fr_desc-pi_ctx-sig_mr,
+IB_MR_CHECK_SIG_STATUS, mr_status);
+   if (ret) {
+   pr_err(ib_check_mr_status failed, ret %d\n, ret);
+   goto fail_mr_status;
+   }
+   if (mr_status.fail_status  IB_MR_CHECK_SIG_STATUS) {
+   u32 block_size = se_cmd-se_dev-dev_attrib.block_size;
+
+   pr_err(PI error found type %d at offset %llx 
+  expected %x vs actual %x\n,
+  mr_status.sig_err.err_type,
+  mr_status.sig_err.sig_err_offset,
+  mr_status.sig_err.expected,
+  mr_status.sig_err.actual);
+   switch (mr_status.sig_err.err_type) {
+   case IB_SIG_BAD_GUARD:
+   se_cmd-pi_err = TARGET_GUARD_CHECK_FAILED;
+   break;
+   case IB_SIG_BAD_REFTAG:
+   se_cmd-pi_err = TARGET_REFTAG_CHECK_FAILED;
+   break;
+   case IB_SIG_BAD_APPTAG:
+   se_cmd-pi_err = TARGET_APPTAG_CHECK_FAILED;
+   break;
+   }
+   se_cmd-block_num =
+   mr_status.sig_err.sig_err_offset / block_size;
+   isert_pi_err_sense_buffer(se_cmd-sense_buffer,
+ ILLEGAL_REQUEST, 0x10,
+ (u8)se_cmd-pi_err);
+   se_cmd-scsi_status = SAM_STAT_CHECK_CONDITION;
+   se_cmd-scsi_sense_length = TRANSPORT_SENSE_BUFFER;
+   se_cmd-se_cmd_flags |= SCF_EMULATED_TASK_SENSE;
+   }
+   }
+
+fail_mr_status:
device-unreg_rdma_mem(isert_cmd, isert_conn);
isert_put_response(isert_conn-conn, cmd);
 }
@@ -1624,7 +1677,43 @@ isert_completion_rdma_read(struct iser_tx_desc *tx_desc,
struct se_cmd *se_cmd = cmd-se_cmd;
struct isert_conn *isert_conn = isert_cmd-conn;
struct isert_device *device = isert_conn-conn_device;
+   struct ib_mr_status mr_status;
+   int ret;
 
+   if (wr-fr_desc  wr-fr_desc-protected) {
+   ret = ib_check_mr_status(wr-fr_desc-pi_ctx-sig_mr,
+IB_MR_CHECK_SIG_STATUS, mr_status);
+   if (ret) {
+   pr_err(ib_check_mr_status failed, ret %d\n, ret);
+   goto fail_mr_status;
+   }
+   if (mr_status.fail_status  IB_MR_CHECK_SIG_STATUS

Re: [PATCH 09/11] IB/isert: Accept RDMA_WRITE completions

2014-01-12 Thread Sagi Grimberg

On 1/11/2014 11:14 PM, Or Gerlitz wrote:

On Thu, Jan 9, 2014 at 6:40 PM, Sagi Grimberg sa...@mellanox.com wrote:

In case of protected transactions, we will need to check the
protection status of the transaction before sending SCSI response.
So be ready for RDMA_WRITE completions. currently we don't ask
for these completions, but for T10-PI we will.
@@ -1721,9 +1735,9 @@ __isert_send_completion(struct iser_tx_desc *tx_desc,
   isert_conn, ib_dev);
 break;
 case ISER_IB_RDMA_WRITE:
-   pr_err(isert_send_completion: Got ISER_IB_RDMA_WRITE\n);
-   dump_stack();
-   break;
+   pr_debug(isert_send_completion: Got ISER_IB_RDMA_WRITE\n);
+   atomic_dec(isert_conn-post_send_buf_count);
+   isert_completion_rdma_write(tx_desc, isert_cmd);

are we doing fall through here? why?


O, somehow i missed it in squash... Will fix, Thanks!


 case ISER_IB_RDMA_READ:
 pr_debug(isert_send_completion: Got ISER_IB_RDMA_READ:\n);




--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 06/11] IB/isert: Initialize T10-PI resources

2014-01-12 Thread Sagi Grimberg

On 1/11/2014 11:09 PM, Or Gerlitz wrote:

On Thu, Jan 9, 2014 at 6:40 PM, Sagi Grimberg sa...@mellanox.com wrote:

@@ -557,8 +629,14 @@ isert_connect_request(struct rdma_cm_id *cma_id, struct 
rdma_cm_event *event)
 goto out_mr;
 }

+   if (pi_support  !device-pi_capable) {
+   pr_err(Protection information requested but not supported\n);
+   ret = -EINVAL;
+   goto out_mr;
+   }
+
 if (device-use_fastreg) {
-   ret = isert_conn_create_fastreg_pool(isert_conn);
+   ret = isert_conn_create_fastreg_pool(isert_conn, pi_support);

just a nit, the pi_support bit can be looked up from the isert_conn
struct, isn't it?


 if (ret) {
 pr_err(Conn: %p failed to create fastreg pool\n,
isert_conn);
@@ -566,7 +644,7 @@ isert_connect_request(struct rdma_cm_id *cma_id, struct 
rdma_cm_event *event)
 }
 }

-   ret = isert_conn_setup_qp(isert_conn, cma_id);
+   ret = isert_conn_setup_qp(isert_conn, cma_id, pi_support);
 if (ret)
 goto out_conn_dev;

@@ -2193,7 +2271,7 @@ isert_fast_reg_mr(struct fast_reg_descriptor *fr_desc,
 pagelist_len = isert_map_fr_pagelist(ib_dev, sg_start, sg_nents,
  
fr_desc-data_frpl-page_list[0]);

-   if (!fr_desc-valid) {
+   if (!fr_desc-data_key_valid) {
 memset(inv_wr, 0, sizeof(inv_wr));
 inv_wr.opcode = IB_WR_LOCAL_INV;
 inv_wr.ex.invalidate_rkey = fr_desc-data_mr-rkey;
@@ -2225,7 +2303,7 @@ isert_fast_reg_mr(struct fast_reg_descriptor *fr_desc,
 pr_err(fast registration failed, ret:%d\n, ret);
 return ret;
 }
-   fr_desc-valid = false;
+   fr_desc-data_key_valid = false;

 ib_sge-lkey = fr_desc-data_mr-lkey;
 ib_sge-addr = fr_desc-data_frpl-page_list[0] + page_off;
diff --git a/drivers/infiniband/ulp/isert/ib_isert.h 
b/drivers/infiniband/ulp/isert/ib_isert.h
index 708a069..fab8b50 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.h
+++ b/drivers/infiniband/ulp/isert/ib_isert.h
@@ -48,11 +48,21 @@ struct iser_tx_desc {
 struct ib_send_wr send_wr;
  } __packed;

+struct pi_context {
+   struct ib_mr   *prot_mr;
+   boolprot_key_valid;
+   struct ib_fast_reg_page_list   *prot_frpl;
+   struct ib_mr   *sig_mr;
+   boolsig_key_valid;
+};
+
  struct fast_reg_descriptor {
-   struct list_headlist;
-   struct ib_mr*data_mr;
-   struct ib_fast_reg_page_list*data_frpl;
-   boolvalid;
+   struct list_headlist;
+   struct ib_mr   *data_mr;
+   booldata_key_valid;
+   struct ib_fast_reg_page_list   *data_frpl;
+   boolprotected;

no need for many bools in one structure... each one needs a bit,
correct? so embed them in one variable


I figured it will be more explicit this way.
protected boolean indicates if we should check the data-integrity 
status, and the other 3 indicates if the relevant MR is valid (no need 
to execute local invalidation).
Do you think I should compact it somehow? usually xxx_valid booleans 
will align together although not always.





+   struct pi_context  *pi_ctx;
  };




  struct isert_rdma_wr {
@@ -140,6 +150,7 @@ struct isert_cq_desc {

  struct isert_device {
 int use_fastreg;
+   boolpi_capable;

this one (and its such) is/are derived from the ib device
capabilities, so I would suggest to keep a copy of the caps instead of
derived bools


Yes, I'll keep the device capabilities instead.




 int cqs_used;
 int refcount;
 int cq_active_qps[ISERT_MAX_CQ];

--
To unsubscribe from this list: send the line unsubscribe target-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] IB/mlx5: Fix smatch warnings

2014-01-19 Thread Sagi Grimberg
Possible double free on in-mailbox.

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/hw/mlx5/mr.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index bc27f6b..f023711 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1050,13 +1050,13 @@ struct ib_mr *mlx5_ib_create_mr(struct ib_pd *pd,
in-seg.flags = MLX5_PERM_UMR_EN | access_mode;
err = mlx5_core_create_mkey(dev-mdev, mr-mmr, in, sizeof(*in),
NULL, NULL, NULL);
-   kfree(in);
if (err)
goto err_destroy_psv;
 
mr-ibmr.lkey = mr-mmr.key;
mr-ibmr.rkey = mr-mmr.key;
mr-umem = NULL;
+   kfree(in);
 
return mr-ibmr;
 
-- 
1.7.8.2

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] IB/mlx5: Fix siganture rule constants according to FW specifications

2014-01-19 Thread Sagi Grimberg
Use DIF CRC INC with apptag escape (0x8) and update IP-CSUM entries.

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/infiniband/hw/mlx5/qp.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 7981620..58c4735 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -1952,9 +1952,9 @@ static int format_selector(struct ib_sig_attrs *attr,
 {
 
 #define FORMAT_DIF_NONE0
-#define FORMAT_DIF_CRC_INC 4
-#define FORMAT_DIF_CSUM_INC12
-#define FORMAT_DIF_CRC_NO_INC  13
+#define FORMAT_DIF_CRC_INC 8
+#define FORMAT_DIF_CRC_NO_INC  12
+#define FORMAT_DIF_CSUM_INC13
 #define FORMAT_DIF_CSUM_NO_INC 14
 
switch (domain-sig.dif.type) {
-- 
1.7.8.2

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   4   5   6   7   8   9   10   >