On 6/17/19 5:01 PM, Quinn Tran wrote:
Attached is the clean-up patch that we held back from the series. > We felt it 
wasn't ready for wider audience because it needed additional
soak time with our test group.

We want to ahead and share it with you to let you know that we intent
to cleanup the duplicate atomic [ref_count|kref].  Once it has some
soak time in our test group, we'll submit it in the next RC window.

Hi Quinn,

Thank you for having shared that patch early. My comments about that patch are as follows: - The patch description is not correct. Today freeing of an SRB does not happen when ref_count reaches zero but it happens when the firmware reports a completion. That is why today the abort code can trigger a use-after-free. ref_count is only useful today for the abort code to detect atomically whether or not the firmware already reported that a request completed. - Only calling cmd->scsi-done(cmd) when the reference count reaches zero involves a behavior change. If a command completion and a request to abort a command race, this patch will report the command as aborted to the SCSI mid-layer instead of as completed. This change has not been mentioned in the patch description. Is this change perhaps unintentional? - I think that this patch does not address the memory leak that can be triggered by aborting a command. If a command is aborted it will be freed by qla_release_fcp_cmd_kref() calling qla2xxx_rel_qpair_sp() and by qla2xxx_rel_qpair_sp() calling sp->free(sp). However, the implementation of the free function for multiple SRB types is incomplete. - The approach for avoiding that qla2xxx_eh_abort() triggers a use-after-free (the new srb_private structure) is interesting. However, that approach does not look correct to me. The patch attached to the previous e-mail inserts the following code in qla2xxx_eh_abort():
  'sp = CMD_SP(cmd); if (!sp || !sp->qpair || ...) return SUCCESS'
As one can see the sp pointer is dereferenced although the memory it points at could already have been freed and could have been reallocated by another driver or another process. So I don't think the new code for avoiding a use-after-free is correct.

Thanks,

Bart.

Reply via email to