On 5/14/2014 10:05 AM, Bart Van Assche wrote:
On 05/13/14 18:48, Sagi Grimberg wrote:
On 5/13/2014 5:44 PM, Bart Van Assche wrote:
static int srp_create_target_ib(struct srp_target_port *target)
{
struct srp_device *dev = target->srp_host->srp_dev;
@@ -318,6 +449,8 @@ static int srp_create_target_ib(struct
srp_target_port *target)
struct ib_cq *recv_cq, *send_cq;
struct ib_qp *qp;
struct ib_fmr_pool *fmr_pool = NULL;
+ struct srp_fr_pool *fr_pool = NULL;
+ const int m = 1 + dev->use_fast_reg;
int ret;
init_attr = kzalloc(sizeof *init_attr, GFP_KERNEL);
@@ -332,7 +465,7 @@ static int srp_create_target_ib(struct
srp_target_port *target)
}
send_cq = ib_create_cq(dev->dev, srp_send_completion, NULL,
target,
- target->queue_size, target->comp_vector);
+ m * target->queue_size, target->comp_vector);
So it is correct to expand the send CQ and QP send queue, but why not x3?
For fast registration we do:
- FASTREG
- RDMA
- LOCAL_INV
I'm aware we are suppressing the completions but I think we need to
reserve room for FLUSH errors in case QP went to error state when the
send queue is packed.
The first FLUSH error triggers a call of srp_tl_err_work(). The second
and later FLUSH errors are ignored by srp_handle_qp_err(). Do you think
multiplying target->queue_size with a factor 3 instead of 2 would make a
difference in fast registration mode ?
Rethinking on this, we post LOCAL_INV only after the command is done, so
this means that
the FASTREG+SEND are already completed successfully thus the consumer
index had been incremented.
So I guess it is safe to save room for x2 WRs in the SQ (*although I'm
not so conclusive on this*).
WRT to the send CQ, the overrun may happen even before consuming the
FLUSH errors (In case the HW will attempt to put a completion on a full
queue).
But this is easy, send CQ should match QP send queue size - so if we
settle for x2 - the send CQ size should be x2 as well.
static void srp_unmap_data(struct scsi_cmnd *scmnd,
struct srp_target_port *target,
struct srp_request *req)
{
- struct ib_device *ibdev = target->srp_host->srp_dev->dev;
- struct ib_pool_fmr **pfmr;
+ struct srp_device *dev = target->srp_host->srp_dev;
+ struct ib_device *ibdev = dev->dev;
+ int i;
if (!scsi_sglist(scmnd) ||
(scmnd->sc_data_direction != DMA_TO_DEVICE &&
scmnd->sc_data_direction != DMA_FROM_DEVICE))
return;
- pfmr = req->fmr_list;
- while (req->nmdesc--)
- ib_fmr_pool_unmap(*pfmr++);
+ if (dev->use_fast_reg) {
+ struct srp_fr_desc **pfr;
+
+ for (i = req->nmdesc, pfr = req->fr_list; i > 0; i--, pfr++)
+ srp_inv_rkey(target, (*pfr)->mr->rkey);
No check on return code here? I assume we should react here if we
failed to post a work request right?
OK, I will add a check for the srp_inv_rkey() return code.
+static int srp_map_finish_fr(struct srp_map_state *state,
+ struct srp_target_port *target)
+{
+ struct srp_device *dev = target->srp_host->srp_dev;
+ struct ib_send_wr *bad_wr;
+ struct ib_send_wr wr;
+ struct srp_fr_desc *desc;
+ u32 rkey;
+
+ desc = srp_fr_pool_get(target->fr_pool);
+ if (!desc)
+ return -ENOMEM;
+
+ rkey = ib_inc_rkey(desc->mr->rkey);
+ ib_update_fast_reg_key(desc->mr, rkey);
+
+ memcpy(desc->frpl->page_list, state->pages,
+ sizeof(state->pages[0]) * state->npages);
I would really like to avoid this memcpy. Any chance we can map directly
to frpl->page_list instead ?
Avoiding this memcpy() would be relatively easy if all memory that is
holding data for a SCSI command would always be registered. However,
since if register_always == false the fast registration algorithm in
this patch only allocates an rkey if needed (npages > 1) and since how
many pages are present in a mapping descriptor is only known after
state->pages[] has been filled in, eliminating that memcpy would be
challenging.
Yes I see, But since the only constraint that forces us to do this
memcpy is the current code logic,
I would like to see a /* TODO */ here to remind us that we should
re-think on how to avoid this.
- state->next_fmr = req->fmr_list;
-
- use_fmr = target->fmr_pool ? SRP_MAP_ALLOW_FMR : SRP_MAP_NO_FMR;
+ if (dev->use_fast_reg) {
+ state->next_fmr = req->fmr_list;
is this correct?
shouldn't this be state->next_fr = req->fr_list (dev->use_fast_reg ==
true)?
or did I misunderstood?
+ use_memory_registration = !!target->fr_pool;
+ } else {
+ state->next_fr = req->fr_list;
Same here?
req->fmr_list and req->fr_list point at the same memory location since
both pointers are members of the same union. This also holds for
state->next_fmr and state->next_fr. So swapping these two statements as
proposed wouldn't change the generated code. But I agree that this would
make the code easier to read.
I didn't think you posted a patch with such an obvious bug...
Just asked about the semantics and as you said, it can improve.
Cheers,
Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html