Hi Brian,
Thanks for reviewing. Comments inline below.
-matt
On Jul 10, 2015, at 2:49 PM, Brian King wrote:
> On 06/19/2015 05:37 PM, Matthew R. Ochs wrote:
>>
>> cfg->init_state = INIT_STATE_NONE;
>> cfg->dev = pdev;
>> + cfg->last_lun_index[0] = 0;
>> + cfg->last_lun_index[1] = 0;
>> cfg->dev_id = (struct pci_device_id *)dev_id;
>> cfg->mcctx = NULL;
>> cfg->err_recovery_active = 0;
>> + cfg->num_user_contexts = 0;
>
> No need to set these to zero, since they get allocated via scsi_host_alloc,
> which zeroes the memory for you.
Agreed. We've actually already fixed this, you'll see it in v2.
>> +/**
>> + * marshall_det_to_rele() - translate detach to release structure
>> + * @detach: Destination structure for the translate/copy.
>> + * @rele: Source structure from which to translate/copy.
>> + */
>> +static void marshall_det_to_rele(struct dk_cxlflash_detach *detach,
>> + struct dk_cxlflash_release *release)
>> +{
>> + release->hdr = detach->hdr;
>> + release->context_id = detach->context_id;
>
> Function name could be better. Marshal should have only one 'l' in this
> usage...
Sure, will fix.
>> + if (wwid)
>> + list_for_each_entry_safe(lun_info, temp, &global.luns, list) {
>> + if (!memcmp(lun_info->wwid, wwid,
>> + DK_CXLFLASH_MANAGE_LUN_WWID_LEN))
>> + return lun_info;
>> + }
>
> You probably want to be grabbing global.slock hre, right?
> list_for_each_entry_safe doesn't protect
> you from other threads deleting things off the list, it only allows you to
> delete
> entries off the list in your loop and continue to iterate.
Yes, will correct.
>> +
>> + /* Store off lun in unpacked, AFU-friendly format */
>> + lun_info->lun_id = lun_to_lunid(sdev->lun);
>> + lun_info->lun_index = cfg->last_lun_index[sdev->channel];
>> +
>> + writeq_be(lun_info->lun_id,
>> + &afu->afu_map->global.fc_port[sdev->channel]
>> + [cfg->last_lun_index[sdev->channel]++]);
>
> Do you need a slave_destroy that undoes this and makes the AFU forget
> about this LUN? I'm thinking you probably want to also destroy any
> user contexts to this device in slave_destroy as well. Otherwise
> if you have a superpipe open to a scsi_device and someone deletes that
> scsi_device through sysfs, you will have an open superpipe to that
> device still and no way to tear it down since you don't have a
> way to issue the detach ioctl any longer.
In v2, you'll see that we have remove any need for the slave_* routines.
>> + pid_t pid = current->tgid, ctxpid = 0;
>> + ulong flags = 0;
>> +
>> + if (unlikely(ctx_ctrl & CTX_CTRL_CLONE))
>
> I'm seeing what is a bit of an overuse of likely / unlikely. Basically, IMHO,
> the only place where you should be using these compiler hints is in the
> performance
> path and even then, only when the odds are pretty high that you will go down
> the likely path.
We've actually revised the likely/unlikely since this patch. You'll see it
toned down in v2.
>>
>> + if (likely(lun_info)) {
>> + list_for_each_entry(lun_access, &ctx_info->luns, list)
>> + if (lun_access->lun_info == lun_info) {
>> + found = true;
>> + break;
>> + }
>
> Do we need some locking here?
Yes, will investigate adding a per-context lock.
>> +static int read_cap16(struct afu *afu, struct lun_info *lun_info, u32
>> port_sel)
>> +{
>> + struct afu_cmd *cmd = NULL;
>> + int rc = 0;
>> +
>
> Suggest using scsi_execute instead. That way it goes up through the block
> layer
> via the normal execution path, should result in less code, and then you also
> get
> error recovery for free. You can then get rid of the nasty looping on
> cxlflash_check_status.
Done.
>> + cmd = cxlflash_cmd_checkout(afu);
>> + if (unlikely(!cmd)) {
>> + pr_err("%s: could not get a free command\n", __func__);
>> + return -1;
>> + }
>> +
>> + cmd->rcb.req_flags = (SISL_REQ_FLAGS_PORT_LUN_ID |
>> + SISL_REQ_FLAGS_SUP_UNDERRUN |
>> + SISL_REQ_FLAGS_HOST_READ);
>> +
>> + cmd->rcb.port_sel = port_sel;
>> + cmd->rcb.lun_id = lun_info->lun_id;
>> + cmd->rcb.data_len = CMD_BUFSIZE;
>> + cmd->rcb.data_ea = (u64) cmd->buf;
>> + cmd->rcb.timeout = MC_DISCOVERY_TIMEOUT;
>> +
>> + cmd->rcb.cdb[0] = 0x9E; /* read cap(16) */
>
> Use SERVICE_ACTION_IN_16 here
Okay.
>> + cmd->rcb.cdb[1] = 0x10; /* service action */
>
> Use SAI_READ_CAPACITY_16 here
You got it.
>> + * rhte_checkin() - releases a resource handle table entry
>> + * @ctx_info: Context owning the resource handle.
>> + * @rht_entry: RHTE to release.
>> + */
>> +void rhte_checkin(struct ctx_info *ctx_info,
>> + struct sisl_rht_entry *rht_entry)
>> +{
>> + rht_entry->nmask = 0;
>> + rht_entry->fp = 0;
>> + ctx_info->rht_out--;
>> + ctx_info->rht_lun[rht_entry - ctx_info->rht_start] = NULL;
>
> Do you need some locking in both the checkout and checkin path?
Likely, will investigate this.
>>
>> + /*
>> + * Clear the Format 1 RHT entry for direct access
>> + * (physical LUN) using the synchronization sequence
>> + * defined in the SISLite specification.
>> + */
>> + rht_entry_f1 = (struct sisl_rht_entry_f1 *)rht_entry;
>> +
>> + rht_entry_f1->valid = 0;
>> + smp_wmb(); /* Make revocation of RHT entry visible */
>
> I think what you actually want to use here is dma_wmb() rather than smp_wmb(),
> assuming you are trying to ensure these writes occur in order with respect
> to the AFU. If you were to build the kernel as a UP kernel, rather than an SMP
> kernel, all these smp_wmb calls would turn into mere compiler barriers, which
> is
> not what you want, I think... Suggest auditing the entire patch as there are
> likely other barriers you may want to change as well.
What we're looking for here is a lightweight sync. Looking at the PPC
barrier.h, I see
that you are correct, we need to specify dma_wmb() instead to ensure we use the
LWSYNC instead of a compiler barrier() when smp_wmb is not defined.. Agree with
the audit idea.
>> + /* Free the context; note that rht_lun was allocated at same time */
>> + kfree(ctx_info);
>> + cfg->num_user_contexts--;
>
> What guarantees atomicity of this increment? I don't see any locking around
> the callers...
Per Mikey Neuling's comments, we have already made this atomic.
>> + /* Take our LUN out of context, free the node */
>> + list_for_each_entry_safe(lun_access, t, &ctx_info->luns, list)
>> + if (lun_access->lun_info == lun_info) {
>> + list_del(&lun_access->list);
>> + kfree(lun_access);
>> + lun_access = NULL;
>> + break;
>> + }
>
> Do you need to do some locking around this? Can others be reading / writing
> to / from this list concurrently?
Reading/writing would be via any of the ioctl threads when determining access
to a context. We'll likely cover this via a per-context lock.
>> +
>> + /* Tear down context following last LUN cleanup */
>> + if (list_empty(&ctx_info->luns)) {
>> + spin_lock_irqsave(&cfg->ctx_tbl_slock, flags);
>> + cfg->ctx_tbl[ctxid] = NULL;
>> + spin_unlock_irqrestore(&cfg->ctx_tbl_slock, flags);
>> +
>> + while (atomic_read(&ctx_info->nrefs) > 1) {
>> + pr_debug("%s: waiting on threads... (%d)\n",
>> + __func__, atomic_read(&ctx_info->nrefs));
>> + cpu_relax();
>
> Hmm. Would an msleep with an overall timeout be more appropriate here? It
> might
> actually be cleaner just to use a kref in the ctx_info struct to manage your
> reference count and then use the release function to do that cleanup,
> then you could simplify all this and eliminate the wait.
In v2, you'll see that we've already move to a sleep instead of a busy-wait
here. Will
need to investigate what moving to a kref would look like.
>> +int cxlflash_mark_contexts_error(struct cxlflash_cfg *cfg)
>> +{
>> + int i, rc = 0;
>> + ulong lock_flags;
>> + struct ctx_info *ctx_info = NULL;
>> +
>> + spin_lock_irqsave(&cfg->ctx_tbl_slock, lock_flags);
>> +
>> + for (i = 0; i < MAX_CONTEXT; i++) {
>> + ctx_info = cfg->ctx_tbl[i];
>> +
>> + if (ctx_info) {
>> + cfg->ctx_tbl[i] = NULL;
>> + list_add(&ctx_info->list, &cfg->ctx_err_recovery);
>> + ctx_info->err_recovery_active = true;
>> + unmap_context(ctx_info);
>> + }
>> + }
>> +
>> + spin_unlock_irqrestore(&cfg->ctx_tbl_slock, lock_flags);
>> +
>> + return rc;
>> +}
>
> This function doesn't appear to be called anywhere. Not even in the follow on
> patch...
You're correct, this might not have been called in the first patch, but is used
in v2.
>> +
>> + /* On first attach set fileops */
>> + if (cfg->num_user_contexts == 0)
>> + cfg->cxl_fops = cxlflash_cxl_fops;
>
> I'm not seeing any synchronization or locking here. How are you managing a
> hundred different users
> hitting your ioctl path at the same time? You are iterating through lists
> below with no locks
> held, but it seems to me like that list could be changing on you. Am I
> missing something here?
This line in particular is now an atomic. We've also added a state check to
block ioctls when needed.
>> + ctxid = cxl_process_element(ctx);
>> + if ((ctxid > MAX_CONTEXT) || (ctxid < 0)) {
>
> Too many parentheses
Fixed.
>> + reg = readq_be(&afu->ctrl_map->mbox_r); /* Try MMIO */
>> + /* MMIO returning 0xff, need to reset */
>> + if (reg == -1) {
>> + pr_info("%s: afu=%p reason 0x%llx\n",
>> + __func__, afu, recover->reason);
>> + cxlflash_afu_reset(cfg);
>> + } else {
>> + pr_debug("%s: reason 0x%llx MMIO working, no reset performed\n",
>> + __func__, recover->reason);
>> + rc = -EINVAL;
>
> This seems a little strange to me. The user asked you to recover the AFU, the
> AFU looks
> fine to you, so you do nothing, yet you fail the request to recover the AFU.
> What about
> multiple users issuing this IOCTL at the same time? Seems like we might want
> to just
> return success rather than make the user second guess what they did wrong in
> building
> the ioctl.
Correct, we've reworked this routine for v2. You'll see behavior similar to as
you've described.
>> + case MODE_VIRTUAL:
>> + last_lba = (((rht_entry->lxt_cnt * MC_CHUNK_SIZE *
>> + lun_info->blk_len) / CXLFLASH_BLOCK_SIZE) - 1);
>> + break;
>
> Should this go in the second patch?
Yes, this should have been in the second patch.
>> +struct dk_cxlflash_uvirtual {
>> + struct dk_cxlflash_hdr hdr; /* Common fields */
>> + __u64 context_id; /* Context to own virtual resources */
>> + __u64 lun_size; /* Requested size, in 4K blocks */
>> + __u64 rsrc_handle; /* Returned resource handle */
>> + __u64 last_lba; /* Returned last LBA of LUN */
>> +};
>
> This isn't used in this patch, so should really be in the second patch.
Yes, this technically should be in the second patch.
>> +
>> +struct dk_cxlflash_release {
>> + struct dk_cxlflash_hdr hdr; /* Common fields */
>> + __u64 context_id; /* Context owning resources */
>> + __u64 rsrc_handle; /* Resource handle to release */
>> +};
>> +
>> +struct dk_cxlflash_resize {
>> + struct dk_cxlflash_hdr hdr; /* Common fields */
>> + __u64 context_id; /* Context owning resources */
>> + __u64 rsrc_handle; /* Resource handle of LUN to resize */
>> + __u64 req_size; /* New requested size, in 4K blocks */
>> + __u64 last_lba; /* Returned last LBA of LUN */
>> +};
>
> Looks like the same is true here. Ideally all the virtual LUN definitions,
> function
> prototypes, etc, would be in the second patch.
Agreed.
>> +#define DK_CXLFLASH_ATTACH CXL_IOW(0x80, dk_cxlflash_attach)
>
> It would probably be good to add a comment to include/uapi/misc/cxl.h to
> indicate
> that cxlflash is reserving this ioctl range so you don't conflict with other
> cxl users.
Sounds reasonable. Will work with the maintainers of that file to do this.
>> +#define DK_CXLFLASH_USER_DIRECT CXL_IOW(0x81,
>> dk_cxlflash_udirect)
>> +#define DK_CXLFLASH_USER_VIRTUAL CXL_IOW(0x82, dk_cxlflash_uvirtual)
>> +#define DK_CXLFLASH_VLUN_RESIZE CXL_IOW(0x83,
>> dk_cxlflash_resize)
>> +#define DK_CXLFLASH_RELEASE CXL_IOW(0x84, dk_cxlflash_release)
>> +#define DK_CXLFLASH_DETACH CXL_IOW(0x85, dk_cxlflash_detach)
>> +#define DK_CXLFLASH_VERIFY CXL_IOW(0x86, dk_cxlflash_verify)
>> +#define DK_CXLFLASH_CLONE CXL_IOW(0x87, dk_cxlflash_clone)
>> +#define DK_CXLFLASH_RECOVER_AFU CXL_IOW(0x88,
>> dk_cxlflash_recover_afu)
>> +#define DK_CXLFLASH_MANAGE_LUN CXL_IOW(0x89,
>> dk_cxlflash_manage_lun)
>> +
>> +#endif /* ifndef _CXLFLASH_IOCTL_H */
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html