On 5/14/2014 10:05 AM, Bart Van Assche wrote:
On 05/13/14 18:48, Sagi Grimberg wrote:
On 5/13/2014 5:44 PM, Bart Van Assche wrote:
   static int srp_create_target_ib(struct srp_target_port *target)
   {
       struct srp_device *dev = target->srp_host->srp_dev;
@@ -318,6 +449,8 @@ static int srp_create_target_ib(struct
srp_target_port *target)
       struct ib_cq *recv_cq, *send_cq;
       struct ib_qp *qp;
       struct ib_fmr_pool *fmr_pool = NULL;
+    struct srp_fr_pool *fr_pool = NULL;
+    const int m = 1 + dev->use_fast_reg;
       int ret;
         init_attr = kzalloc(sizeof *init_attr, GFP_KERNEL);
@@ -332,7 +465,7 @@ static int srp_create_target_ib(struct
srp_target_port *target)
       }
         send_cq = ib_create_cq(dev->dev, srp_send_completion, NULL,
target,
-                   target->queue_size, target->comp_vector);
+                   m * target->queue_size, target->comp_vector);
So it is correct to expand the send CQ and QP send queue, but why not x3?
For fast registration we do:
- FASTREG
- RDMA
- LOCAL_INV

I'm aware we are suppressing the completions but I think we need to
reserve room for FLUSH errors in case QP went to error state when the
send queue is packed.
The first FLUSH error triggers a call of srp_tl_err_work(). The second
and later FLUSH errors are ignored by srp_handle_qp_err(). Do you think
multiplying target->queue_size with a factor 3 instead of 2 would make a
difference in fast registration mode ?

Rethinking on this, we post LOCAL_INV only after the command is done, so this means that the FASTREG+SEND are already completed successfully thus the consumer index had been incremented. So I guess it is safe to save room for x2 WRs in the SQ (*although I'm not so conclusive on this*). WRT to the send CQ, the overrun may happen even before consuming the FLUSH errors (In case the HW will attempt to put a completion on a full queue). But this is easy, send CQ should match QP send queue size - so if we settle for x2 - the send CQ size should be x2 as well.

   static void srp_unmap_data(struct scsi_cmnd *scmnd,
                  struct srp_target_port *target,
                  struct srp_request *req)
   {
-    struct ib_device *ibdev = target->srp_host->srp_dev->dev;
-    struct ib_pool_fmr **pfmr;
+    struct srp_device *dev = target->srp_host->srp_dev;
+    struct ib_device *ibdev = dev->dev;
+    int i;
         if (!scsi_sglist(scmnd) ||
           (scmnd->sc_data_direction != DMA_TO_DEVICE &&
            scmnd->sc_data_direction != DMA_FROM_DEVICE))
           return;
   -    pfmr = req->fmr_list;
-    while (req->nmdesc--)
-        ib_fmr_pool_unmap(*pfmr++);
+    if (dev->use_fast_reg) {
+        struct srp_fr_desc **pfr;
+
+        for (i = req->nmdesc, pfr = req->fr_list; i > 0; i--, pfr++)
+            srp_inv_rkey(target, (*pfr)->mr->rkey);
No check on return code here? I assume we should react here if we
failed to post a work request right?
OK, I will add a check for the srp_inv_rkey() return code.

   +static int srp_map_finish_fr(struct srp_map_state *state,
+                 struct srp_target_port *target)
+{
+    struct srp_device *dev = target->srp_host->srp_dev;
+    struct ib_send_wr *bad_wr;
+    struct ib_send_wr wr;
+    struct srp_fr_desc *desc;
+    u32 rkey;
+
+    desc = srp_fr_pool_get(target->fr_pool);
+    if (!desc)
+        return -ENOMEM;
+
+    rkey = ib_inc_rkey(desc->mr->rkey);
+    ib_update_fast_reg_key(desc->mr, rkey);
+
+    memcpy(desc->frpl->page_list, state->pages,
+           sizeof(state->pages[0]) * state->npages);
I would really like to avoid this memcpy. Any chance we can map directly
to frpl->page_list instead ?
Avoiding this memcpy() would be relatively easy if all memory that is
holding data for a SCSI command would always be registered. However,
since if register_always == false the fast registration algorithm in
this patch only allocates an rkey if needed (npages > 1) and since how
many pages are present in a mapping descriptor is only known after
state->pages[] has been filled in, eliminating that memcpy would be
challenging.

Yes I see, But since the only constraint that forces us to do this memcpy is the current code logic, I would like to see a /* TODO */ here to remind us that we should re-think on how to avoid this.

-    state->next_fmr    = req->fmr_list;
-
-    use_fmr = target->fmr_pool ? SRP_MAP_ALLOW_FMR : SRP_MAP_NO_FMR;
+    if (dev->use_fast_reg) {
+        state->next_fmr = req->fmr_list;
is this correct?
shouldn't this be state->next_fr = req->fr_list (dev->use_fast_reg ==
true)?
or did I misunderstood?

+        use_memory_registration = !!target->fr_pool;
+    } else {
+        state->next_fr = req->fr_list;
Same here?
req->fmr_list and req->fr_list point at the same memory location since
both pointers are members of the same union. This also holds for
state->next_fmr and state->next_fr. So swapping these two statements as
proposed wouldn't change the generated code. But I agree that this would
make the code easier to read.

I didn't think you posted a patch with such an obvious bug...
Just asked about the semantics and as you said, it can improve.

Cheers,
Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to