commit 895427bd012c ("scsi: lpfc: NVME Initiator: Base modifications")
binds the CQs and WQs ring pointer (sets it to same address on both).
lpfc_create_wq_cq():
...
rc = lpfc_cq_create(phba, cq, eq, <...>)
...
rc = lpfc_wq_create(phba, wq, cq, qtype);
...
/* Bind this CQ/WQ to the NVME ring */
pring = wq->pring;
...
cq->pring = pring;
...
The commit frees both CQ & WQ for FCP/NVME on lpfc_sli4_queue_destroy(),
which causes a double free (potential corruption or panic) on freeing
the ring pointer of the second entity (CQ is first, WQ is second):
lpfc_pci_remove_one() # that is, .remove / .shutdown
-> lpfc_pci_remove_one_s4()
-> lpfc_sli4_hba_unset()
-> lpfc_sli4_queue_destroy()
-> lpfc_sli4_release_queues() # Release FCP/NVME cqs
-> __lpfc_sli4_release_queue()
-> lpfc_sli4_queue_free()
-> kfree(queue->pring) # first free
-> lpfc_sli4_release_queues() # Release FCP/NVME wqs
-> __lpfc_sli4_release_queue()
-> lpfc_sli4_queue_free()
-> kfree(queue->pring) # second free
So, check for WQs in lpfc_sli4_queue_free() and do not free the pring,
as it is freed before in the bound CQ. [the WQs are created only via
lpfc_wq_create(), which sets struct lpfc_queue::type == LPFC_WQ. And
that happens in 2 sites (lpfc_create_wq_cq() & lpfc_fof_queue_setup()),
both of which bind the CQs & WQs. Thus, checking for the LPFC_WQ type
correlates to whether the WQ is bound to a CQ, which is freed first.]
Additional details:
For reference, that binding also occurs on one other function:
lpfc_fof_queue_setup():
...
rc = lpfc_cq_create(phba, phba->sli4_hba.oas_cq, <...>)
...
rc = lpfc_wq_create(phba, phba->sli4_hba.oas_wq, <...>)
...
/* Bind this CQ/WQ to the NVME ring */
pring = phba->sli4_hba.oas_wq->pring;
...
phba->sli4_hba.oas_cq->pring = pring;
And used to occur similarly on lpfc_sli4_queue_setup(), but was changed
by that commit; although the problem is more related to the new freeing
pattern introduced in lpfc_sli4_queue_destroy() plus the bound CQs/WQs.
- /* Bind this WQ to the next FCP ring */
- pring = &psli->ring[MAX_SLI3_CONFIGURED_RINGS + fcp_wqidx];
...
- phba->sli4_hba.fcp_cq[fcp_wqidx]->pring = pring;
commit 85e8a23936ab ("scsi: lpfc: Add shutdown method for kexec") made
this more likely as lpfc_pci_remove_one() is called on driver shutdown
(e.g., modprobe -r / rmmod).
(this patch is partially based on a different patch suggested by Johannes,
thus adding a Suggested-by tag for due credit.)
Signed-off-by: Mauricio Faria de Oliveira <[email protected]>
Reported-by: Junichi Nomura <[email protected]>
Suggested-by: Johannes Thumshirn <[email protected]>
---
drivers/scsi/lpfc/lpfc_sli.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
index 1c9fa45df7eb..8befe841adaa 100644
--- a/drivers/scsi/lpfc/lpfc_sli.c
+++ b/drivers/scsi/lpfc/lpfc_sli.c
@@ -13758,7 +13758,14 @@ void lpfc_sli4_els_xri_abort_event_proc(struct
lpfc_hba *phba)
lpfc_free_rq_buffer(queue->phba, queue);
kfree(queue->rqbp);
}
- kfree(queue->pring);
+
+ /*
+ * The WQs/CQs' pring is bound (same pointer).
+ * So free it only once, and not again on WQ.
+ */
+ if (queue->type != LPFC_WQ)
+ kfree(queue->pring);
+
kfree(queue);
return;
}
--
1.8.3.1