On Fri, 25 Jul 2014 11:23:02 -0400
Chad Dupuis <[email protected]> wrote:
>
>
> On Wed, 18 Jun 2014, Joe Lawrence wrote:
>
> > Introduce mutual exclusion between the qla2xxx_remove_one PCI driver
> > callback and qla2x00_disable_board_on_pci_error, which is scheduled as
> > board_disable work by qla2x00_check_reg{32,16}_for_disconnect:
> >
> > * Leave the driver-specific data attached to the underlying PCI device
> > intact in qla2x00_disable_board_on_pci_error, so that qla2x00_remove_one
> > has enough breadcrumbs to determine that any board_disable work has been
> > completed.
> >
> > * In qla2xxx_remove_one, set a bit to prevent any subsequent
> > board_disable work from scheduling, then cancel and wait until pending
> > work has completed.
> >
> > * Reuse the PCI device enable count check in qla2x00_remove_one to
> > determine if board_disable has occured. The original purpose of this
> > check was unnecessary since the driver remove function wasn't called
> > when the probe fails.
> >
> > Signed-off-by: Joe Lawrence <[email protected]>
> > ---
> > drivers/scsi/qla2xxx/qla_def.h | 1 +
> > drivers/scsi/qla2xxx/qla_isr.c | 3 ++-
> > drivers/scsi/qla2xxx/qla_os.c | 31 +++++++++++++++++++------------
> > 3 files changed, 22 insertions(+), 13 deletions(-)
> >
> > diff --git a/drivers/scsi/qla2xxx/qla_def.h b/drivers/scsi/qla2xxx/qla_def.h
> > index 1267b11..7c441c9 100644
> > --- a/drivers/scsi/qla2xxx/qla_def.h
> > +++ b/drivers/scsi/qla2xxx/qla_def.h
> > @@ -3404,6 +3404,7 @@ typedef struct scsi_qla_host {
> >
> > unsigned long pci_flags;
> > #define PFLG_DISCONNECTED 0 /* PCI device removed */
> > +#define PFLG_DRIVER_REMOVING 1 /* PCI driver .remove */
> >
> > uint32_t device_flags;
> > #define SWITCH_FOUND BIT_0
> > diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c
> > index 2741ad8..ee5eef4 100644
> > --- a/drivers/scsi/qla2xxx/qla_isr.c
> > +++ b/drivers/scsi/qla2xxx/qla_isr.c
> > @@ -117,7 +117,8 @@ qla2x00_check_reg32_for_disconnect(scsi_qla_host_t
> > *vha, uint32_t reg)
> > {
> > /* Check for PCI disconnection */
> > if (reg == 0xffffffff) {
> > - if (!test_and_set_bit(PFLG_DISCONNECTED, &vha->pci_flags)) {
> > + if (!test_and_set_bit(PFLG_DISCONNECTED, &vha->pci_flags) &&
> > + !test_bit(PFLG_DRIVER_REMOVING, &vha->pci_flags)) {
>
> In our remove function we set a bit that we are unloading:
>
> set_bit (UNLOADING, &base_vha->dpc_flags);
>
> could this be used instead?
Hi Chad,
Thanks for the comments.
I think with a little bit of code shuffling this could be accomplished.
My goal with the patchset was to try and present the problem/fix as
plain as possible. It was easiest to collect all the atomic bits I
needed inside a single variable. Should I be tacking on such flags to
'dpc_flags' ?
>
> > /*
> > * Schedule this (only once) on the default system
> > * workqueue so that all the adapter workqueues and the
> > diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
> > index 39c9953..51cba37 100644
> > --- a/drivers/scsi/qla2xxx/qla_os.c
> > +++ b/drivers/scsi/qla2xxx/qla_os.c
> > @@ -3123,15 +3123,25 @@ qla2x00_remove_one(struct pci_dev *pdev)
> > scsi_qla_host_t *base_vha;
> > struct qla_hw_data *ha;
> >
> > + base_vha = pci_get_drvdata(pdev);
> > + ha = base_vha->hw;
> > +
> > + /* Indicate device removal to prevent future board_disable and wait
> > + * until any pending board_disable has completed. */
> > + set_bit(PFLG_DRIVER_REMOVING, &base_vha->pci_flags);
> > + cancel_work_sync(&ha->board_disable);
> > +
> > /*
> > - * If the PCI device is disabled that means that probe failed and any
> > - * resources should be have cleaned up on probe exit.
> > + * If the PCI device is disabled then there was a PCI-disconnect and
> > + * qla2x00_disable_board_on_pci_error has taken care of most of the
> > + * resources.
> > */
> > - if (!atomic_read(&pdev->enable_cnt))
> > + if (!atomic_read(&pdev->enable_cnt)) {
>
> Should we also add a check here that this is a disconnection before
> freeing these structs? The original intent of the check for
> pdev->enable_cnt is to make sure we don't try to dereference an already
> freed struct if probe failed.
I'm not exactly sure what you're asking here. In my tests, when .probe
return -ERRNO, .remove was not called. Is there another call path into
qla2x00_remove_one?
The reason I didn't completely cleanup qla_hw_data in
qla2x00_disable_board_on_pci_error and re-purposed the PCI enable count
check, was that I needed some way of determining that any
board_disable work was out of the way before proceeding with
qla2x00_remove_one.
The patch's set_bit / cancel_work_sync above (along with the test_bit
before board_disable schedule) should be ensuring that the
board_disable won't be running or rescheduling in the future.
If qla2x00_disable_board_on_pci_error got as far as actually disabling
the PCI device, then qla2x00_remove_one's check on its enable count
would be verifying that. If the device is still enabled, then
qla2x00_remove_one knows that board_disable didn't clean up the device.
Would it be clearer if I used an explicit scsi_qla_host flag to
indicate that state?
>
> > + scsi_host_put(base_vha->host);
> > + kfree(ha);
> > + pci_set_drvdata(pdev, NULL);
> > return;
> > -
> > - base_vha = pci_get_drvdata(pdev);
> > - ha = base_vha->hw;
> > + }
> >
> > qla2x00_wait_for_hba_ready(base_vha);
> >
> > @@ -4791,18 +4801,15 @@ qla2x00_disable_board_on_pci_error(struct
> > work_struct *work)
> > qla82xx_md_free(base_vha);
> > qla2x00_free_queues(ha);
> >
> > - scsi_host_put(base_vha->host);
> > -
> > qla2x00_unmap_iobases(ha);
> >
> > pci_release_selected_regions(ha->pdev, ha->bars);
> > - kfree(ha);
> > - ha = NULL;
> > -
> > pci_disable_pcie_error_reporting(pdev);
> > pci_disable_device(pdev);
> > - pci_set_drvdata(pdev, NULL);
> >
> > + /*
> > + * Let qla2x00_remove_one cleanup qla_hw_data on device removal.
> > + */
> > }
> >
> > /**************************************************************************
> >
Regards,
-- Joe
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html