Re: [PATCH 1/2] cxlflash: Base superpipe support

Matthew R. Ochs Sat, 11 Jul 2015 16:37:46 -0700

Hi Brian,

Thanks for reviewing. Comments inline below.



-matt

On Jul 10, 2015, at 2:49 PM, Brian King wrote:
> On 06/19/2015 05:37 PM, Matthew R. Ochs wrote:
>> 
>>      cfg->init_state = INIT_STATE_NONE;
>>      cfg->dev = pdev;
>> +    cfg->last_lun_index[0] = 0;
>> +    cfg->last_lun_index[1] = 0;
>>      cfg->dev_id = (struct pci_device_id *)dev_id;
>>      cfg->mcctx = NULL;
>>      cfg->err_recovery_active = 0;
>> +    cfg->num_user_contexts = 0;
> 
> No need to set these to zero, since they get allocated via scsi_host_alloc,
> which zeroes the memory for you.

Agreed. We've actually already fixed this, you'll see it in v2.

>> +/**
>> + * marshall_det_to_rele() - translate detach to release structure
>> + * @detach: Destination structure for the translate/copy.
>> + * @rele:   Source structure from which to translate/copy.
>> + */
>> +static void marshall_det_to_rele(struct dk_cxlflash_detach *detach,
>> +                             struct dk_cxlflash_release *release)
>> +{
>> +    release->hdr = detach->hdr;
>> +    release->context_id = detach->context_id;
> 
> Function name could be better. Marshal should have only one 'l' in this 
> usage...

Sure, will fix.

>> +    if (wwid)
>> +            list_for_each_entry_safe(lun_info, temp, &global.luns, list) {
>> +                    if (!memcmp(lun_info->wwid, wwid,
>> +                                DK_CXLFLASH_MANAGE_LUN_WWID_LEN))
>> +                            return lun_info;
>> +            }
> 
> You probably want to be grabbing global.slock hre, right? 
> list_for_each_entry_safe doesn't protect
> you from other threads deleting things off the list, it only allows you to 
> delete
> entries off the list in your loop and continue to iterate.

Yes, will correct.

>> +
>> +    /* Store off lun in unpacked, AFU-friendly format */
>> +    lun_info->lun_id = lun_to_lunid(sdev->lun);
>> +    lun_info->lun_index = cfg->last_lun_index[sdev->channel];
>> +
>> +    writeq_be(lun_info->lun_id,
>> +              &afu->afu_map->global.fc_port[sdev->channel]
>> +              [cfg->last_lun_index[sdev->channel]++]);
> 
> Do you need a slave_destroy that undoes this and makes the AFU forget
> about this LUN? I'm thinking you probably want to also destroy any
> user contexts to this device in slave_destroy as well. Otherwise
> if you have a superpipe open to a scsi_device and someone deletes that
> scsi_device through sysfs, you will have an open superpipe to that
> device still and no way to tear it down since you don't have a
> way to issue the detach ioctl any longer.

In v2, you'll see that we have remove any need for the slave_* routines.

>> +    pid_t pid = current->tgid, ctxpid = 0;
>> +    ulong flags = 0;
>> +
>> +    if (unlikely(ctx_ctrl & CTX_CTRL_CLONE))
> 
> I'm seeing what is a bit of an overuse of likely / unlikely. Basically, IMHO,
> the only place where you should be using these compiler hints is in the 
> performance
> path and even then, only when the odds are pretty high that you will go down
> the likely path.

We've actually revised the likely/unlikely since this patch. You'll see it 
toned down in v2.

>> 
>> +            if (likely(lun_info)) {
>> +                    list_for_each_entry(lun_access, &ctx_info->luns, list)
>> +                            if (lun_access->lun_info == lun_info) {
>> +                                    found = true;
>> +                                    break;
>> +                            }
> 
> Do we need some locking here?

Yes, will investigate adding a per-context lock.

>> +static int read_cap16(struct afu *afu, struct lun_info *lun_info, u32 
>> port_sel)
>> +{
>> +    struct afu_cmd *cmd = NULL;
>> +    int rc = 0;
>> +
> 
> Suggest using scsi_execute instead. That way it goes up through the block 
> layer
> via the normal execution path, should result in less code, and then you also 
> get
> error recovery for free. You can then get rid of the nasty looping on 
> cxlflash_check_status.

Done.

>> +    cmd = cxlflash_cmd_checkout(afu);
>> +    if (unlikely(!cmd)) {
>> +            pr_err("%s: could not get a free command\n", __func__);
>> +            return -1;
>> +    }
>> +
>> +    cmd->rcb.req_flags = (SISL_REQ_FLAGS_PORT_LUN_ID |
>> +                          SISL_REQ_FLAGS_SUP_UNDERRUN |
>> +                          SISL_REQ_FLAGS_HOST_READ);
>> +
>> +    cmd->rcb.port_sel = port_sel;
>> +    cmd->rcb.lun_id = lun_info->lun_id;
>> +    cmd->rcb.data_len = CMD_BUFSIZE;
>> +    cmd->rcb.data_ea = (u64) cmd->buf;
>> +    cmd->rcb.timeout = MC_DISCOVERY_TIMEOUT;
>> +
>> +    cmd->rcb.cdb[0] = 0x9E; /* read cap(16) */
> 
> Use SERVICE_ACTION_IN_16 here

Okay.

>> +    cmd->rcb.cdb[1] = 0x10; /* service action */
> 
> Use SAI_READ_CAPACITY_16 here

You got it.

>> + * rhte_checkin() - releases a resource handle table entry
>> + * @ctx_info:       Context owning the resource handle.
>> + * @rht_entry:      RHTE to release.
>> + */
>> +void rhte_checkin(struct ctx_info *ctx_info,
>> +              struct sisl_rht_entry *rht_entry)
>> +{
>> +    rht_entry->nmask = 0;
>> +    rht_entry->fp = 0;
>> +    ctx_info->rht_out--;
>> +    ctx_info->rht_lun[rht_entry - ctx_info->rht_start] = NULL;
> 
> Do you need some locking in both the checkout and checkin path?

Likely, will investigate this.

>> 
>> +            /*
>> +             * Clear the Format 1 RHT entry for direct access
>> +             * (physical LUN) using the synchronization sequence
>> +             * defined in the SISLite specification.
>> +             */
>> +            rht_entry_f1 = (struct sisl_rht_entry_f1 *)rht_entry;
>> +
>> +            rht_entry_f1->valid = 0;
>> +            smp_wmb(); /* Make revocation of RHT entry visible */
> 
> I think what you actually want to use here is dma_wmb() rather than smp_wmb(),
> assuming you are trying to ensure these writes occur in order with respect
> to the AFU. If you were to build the kernel as a UP kernel, rather than an SMP
> kernel, all these smp_wmb calls would turn into mere compiler barriers, which 
> is
> not what you want, I think... Suggest auditing the entire patch as there are
> likely other barriers you may want to change as well.

What we're looking for here is a lightweight sync. Looking at the PPC 
barrier.h, I see
that you are correct, we need to specify dma_wmb() instead to ensure we use the
LWSYNC instead of a compiler barrier() when smp_wmb is not defined.. Agree with
the audit idea.

>> +    /* Free the context; note that rht_lun was allocated at same time */
>> +    kfree(ctx_info);
>> +    cfg->num_user_contexts--;
> 
> What guarantees atomicity of this increment? I don't see any locking around 
> the callers...

Per Mikey Neuling's comments, we have already made this atomic.

>> +    /* Take our LUN out of context, free the node */
>> +    list_for_each_entry_safe(lun_access, t, &ctx_info->luns, list)
>> +            if (lun_access->lun_info == lun_info) {
>> +                    list_del(&lun_access->list);
>> +                    kfree(lun_access);
>> +                    lun_access = NULL;
>> +                    break;
>> +            }
> 
> Do you need to do some locking around this? Can others be reading / writing
> to / from this list concurrently?

Reading/writing would be via any of the ioctl threads when determining access
to a context. We'll likely cover this via a per-context lock.

>> +
>> +    /* Tear down context following last LUN cleanup */
>> +    if (list_empty(&ctx_info->luns)) {
>> +            spin_lock_irqsave(&cfg->ctx_tbl_slock, flags);
>> +            cfg->ctx_tbl[ctxid] = NULL;
>> +            spin_unlock_irqrestore(&cfg->ctx_tbl_slock, flags);
>> +
>> +            while (atomic_read(&ctx_info->nrefs) > 1) {
>> +                    pr_debug("%s: waiting on threads... (%d)\n",
>> +                             __func__, atomic_read(&ctx_info->nrefs));
>> +                    cpu_relax();
> 
> Hmm. Would an msleep with an overall timeout be more appropriate here? It 
> might
> actually be cleaner just to use a kref in the ctx_info struct to manage your
> reference count and then use the release function to do that cleanup,
> then you could simplify all this and eliminate the wait.

In v2, you'll see that we've already move to a sleep instead of a busy-wait 
here. Will
need to investigate what moving to a kref would look like.

>> +int cxlflash_mark_contexts_error(struct cxlflash_cfg *cfg)
>> +{
>> +    int i, rc = 0;
>> +    ulong lock_flags;
>> +    struct ctx_info *ctx_info = NULL;
>> +
>> +    spin_lock_irqsave(&cfg->ctx_tbl_slock, lock_flags);
>> +
>> +    for (i = 0; i < MAX_CONTEXT; i++) {
>> +            ctx_info = cfg->ctx_tbl[i];
>> +
>> +            if (ctx_info) {
>> +                    cfg->ctx_tbl[i] = NULL;
>> +                    list_add(&ctx_info->list, &cfg->ctx_err_recovery);
>> +                    ctx_info->err_recovery_active = true;
>> +                    unmap_context(ctx_info);
>> +            }
>> +    }
>> +
>> +    spin_unlock_irqrestore(&cfg->ctx_tbl_slock, lock_flags);
>> +
>> +    return rc;
>> +}
> 
> This function doesn't appear to be called anywhere. Not even in the follow on 
> patch...

You're correct, this might not have been called in the first patch, but is used 
in v2.

>> +
>> +    /* On first attach set fileops */
>> +    if (cfg->num_user_contexts == 0)
>> +            cfg->cxl_fops = cxlflash_cxl_fops;
> 
> I'm not seeing any synchronization or locking here. How are you managing a 
> hundred different users
> hitting your ioctl path at the same time? You are iterating through lists 
> below with no locks
> held, but it seems to me like that list could be changing on you. Am I 
> missing something here?

This line in particular is now an atomic. We've also added a state check to 
block ioctls when needed.

>> +    ctxid = cxl_process_element(ctx);
>> +    if ((ctxid > MAX_CONTEXT) || (ctxid < 0)) {
> 
> Too many parentheses

Fixed.

>> +    reg = readq_be(&afu->ctrl_map->mbox_r); /* Try MMIO */
>> +    /* MMIO returning 0xff, need to reset */
>> +    if (reg == -1) {
>> +            pr_info("%s: afu=%p reason 0x%llx\n",
>> +                    __func__, afu, recover->reason);
>> +            cxlflash_afu_reset(cfg);
>> +    } else {
>> +            pr_debug("%s: reason 0x%llx MMIO working, no reset performed\n",
>> +                     __func__, recover->reason);
>> +            rc = -EINVAL;
> 
> This seems a little strange to me. The user asked you to recover the AFU, the 
> AFU looks
> fine to you, so you do nothing, yet you fail the request to recover the AFU. 
> What about
> multiple users issuing this IOCTL at the same time? Seems like we might want 
> to just
> return success rather than make the user second guess what they did wrong in 
> building
> the ioctl.

Correct, we've reworked this routine for v2. You'll see behavior similar to as 
you've described.

>> +    case MODE_VIRTUAL:
>> +            last_lba = (((rht_entry->lxt_cnt * MC_CHUNK_SIZE *
>> +                          lun_info->blk_len) / CXLFLASH_BLOCK_SIZE) - 1);
>> +            break;
> 
> Should this go in the second patch?

Yes, this should have been in the second patch.

>> +struct dk_cxlflash_uvirtual {
>> +    struct dk_cxlflash_hdr hdr;     /* Common fields */
>> +    __u64 context_id;               /* Context to own virtual resources */
>> +    __u64 lun_size;                 /* Requested size, in 4K blocks */
>> +    __u64 rsrc_handle;              /* Returned resource handle */
>> +    __u64 last_lba;                 /* Returned last LBA of LUN */
>> +};
> 
> This isn't used in this patch, so should really be in the second patch.

Yes, this technically should be in the second patch.

>> +
>> +struct dk_cxlflash_release {
>> +    struct dk_cxlflash_hdr hdr;     /* Common fields */
>> +    __u64 context_id;               /* Context owning resources */
>> +    __u64 rsrc_handle;              /* Resource handle to release */
>> +};
>> +
>> +struct dk_cxlflash_resize {
>> +    struct dk_cxlflash_hdr hdr;     /* Common fields */
>> +    __u64 context_id;               /* Context owning resources */
>> +    __u64 rsrc_handle;              /* Resource handle of LUN to resize */
>> +    __u64 req_size;                 /* New requested size, in 4K blocks */
>> +    __u64 last_lba;                 /* Returned last LBA of LUN */
>> +};
> 
> Looks like the same is true here. Ideally all the virtual LUN definitions, 
> function
> prototypes, etc, would be in the second patch.

Agreed.

>> +#define DK_CXLFLASH_ATTACH          CXL_IOW(0x80, dk_cxlflash_attach)
> 
> It would probably be good to add a comment to include/uapi/misc/cxl.h to 
> indicate
> that cxlflash is reserving this ioctl range so you don't conflict with other
> cxl users.

Sounds reasonable. Will work with the maintainers of that file to do this.

>> +#define DK_CXLFLASH_USER_DIRECT             CXL_IOW(0x81, 
>> dk_cxlflash_udirect)
>> +#define DK_CXLFLASH_USER_VIRTUAL    CXL_IOW(0x82, dk_cxlflash_uvirtual)
>> +#define DK_CXLFLASH_VLUN_RESIZE             CXL_IOW(0x83, 
>> dk_cxlflash_resize)
>> +#define DK_CXLFLASH_RELEASE         CXL_IOW(0x84, dk_cxlflash_release)
>> +#define DK_CXLFLASH_DETACH          CXL_IOW(0x85, dk_cxlflash_detach)
>> +#define DK_CXLFLASH_VERIFY          CXL_IOW(0x86, dk_cxlflash_verify)
>> +#define DK_CXLFLASH_CLONE           CXL_IOW(0x87, dk_cxlflash_clone)
>> +#define DK_CXLFLASH_RECOVER_AFU             CXL_IOW(0x88, 
>> dk_cxlflash_recover_afu)
>> +#define DK_CXLFLASH_MANAGE_LUN              CXL_IOW(0x89, 
>> dk_cxlflash_manage_lun)
>> +
>> +#endif /* ifndef _CXLFLASH_IOCTL_H */

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] cxlflash: Base superpipe support

Reply via email to