On Mon, Jul 18, 2016 at 04:28:57PM -0400, Laurence Oberman wrote:
>
>
> ----- Original Message -----
> > From: "Johannes Thumshirn" <[email protected]>
> > To: "Martin K . Petersen" <[email protected]>, "James Bottomley"
> > <[email protected]>, "James Smart"
> > <[email protected]>, "Dick Kennedy" <[email protected]>
> > Cc: "Linux SCSI Mailinglist" <[email protected]>, "Johannes
> > Thumshirn" <[email protected]>
> > Sent: Monday, July 18, 2016 10:06:03 AM
> > Subject: [PATCH 1/2] lpfc: call lpfc_sli_validate_fcp_iocb() with the
> > hbalock held
> >
> > Call lpfc_sli_validate_fcp_iocb() with the hbalock held, as the pointer to
> > iocbq is not guaranteed to still be valid after looking it up.
> >
> > Signed-off-by: Johannes Thumshirn <[email protected]>
> > ---
> > drivers/scsi/lpfc/lpfc_sli.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
> > index 1a248a2..d58ec4c 100644
> > --- a/drivers/scsi/lpfc/lpfc_sli.c
> > +++ b/drivers/scsi/lpfc/lpfc_sli.c
> > @@ -9978,6 +9978,7 @@ lpfc_sli_sum_iocb(struct lpfc_vport *vport, uint16_t
> > tgt_id, uint64_t lun_id,
> > struct lpfc_iocbq *iocbq;
> > int sum, i;
> >
> > + spin_lock_irq(&phba->hbalock);
> > for (i = 1, sum = 0; i <= phba->sli.last_iotag; i++) {
> > iocbq = phba->sli.iocbq_lookup[i];
> >
> > @@ -9985,6 +9986,7 @@ lpfc_sli_sum_iocb(struct lpfc_vport *vport, uint16_t
> > tgt_id, uint64_t lun_id,
> > ctx_cmd) == 0)
> > sum++;
> > }
> > + spin_unlock_irq(&phba->hbalock);
> >
> > return sum;
> > }
> > --
> > 1.8.5.6
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
> Hi Johannes
>
> The code looks correct, but I would get a review from James Smart or Dick
> Kennedy at Emulex (now Broadcom :))
Sure, that's why they're on Cc ;-).
> Have you experienced an issue already due to this, can you share the stack
> trace of the issue if you have it.
We have had an issue at a customer's side. It's actually pretty hard to
trigger (it involves two separated SANs and several FC switches in between and
link bounces) but the core dumps show that in lpfc_sli_validate_fcp_iocb() the
iocbq address in invalid. It actually _is_ a valid kernel pointer and _was_ a
valid 'struct lpfc_iocbq' but isn't anymore. So we suggest a use-after-free.
This patch is running OK withing one of our partner's QA sites for 3 weeks now
(we've been chasing it since October).
c.f this part of our partner's analysis:
crash> bt
PID: 62174 TASK: ffff880285fe40c0 CPU: 0 COMMAND: "kworker/0:1"
#0 [ffff88027dfb7bc0] crash_kexec at ffffffff8008e4ca
#1 [ffff88027dfb7c90] oops_end at ffffffff804129a8
#2 [ffff88027dfb7cb0] general_protection at ffffffff80411d28
[exception RIP: lpfc_sli_validate_fcp_iocb+148]
RIP: ffffffffa0683114 RSP: ffff88027dfb7d60 RFLAGS: 00010246
RAX: 2e362e322f6e6f68 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff8803c2417678 RDI: ffff88027d5a1a70
RBP: ffff8803d8c28000 R8: 0000000000000001 R9: 0000000000000000
R10: ffffffffa06f7898 R11: 0000000000000000 R12: 0000000000001340
R13: ffff8803c2ac2000 R14: ffff8803c2417678 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: e030 SS: e02b
#3 [ffff88027dfb7d68] lpfc_sli_abort_iocb at ffffffffa06831bb [lpfc]
#4 [ffff88027dfb7dc8] fc_terminate_rport_io at ffffffffa0502aea
[scsi_transport_fc]
#5 [ffff88027dfb7de8] fc_timeout_deleted_rport at ffffffffa050338c
[scsi_transport_fc]
#6 [ffff88027dfb7e28] process_one_work at ffffffff80062ea8
#7 [ffff88027dfb7e78] worker_thread at ffffffff800668aa
#8 [ffff88027dfb7ee8] kthread at ffffffff8006a506
#9 [ffff88027dfb7f48] kernel_thread_helper at ffffffff8041a244
crash> dis lpfc_sli_validate_fcp_iocb
...
0xffffffffa068310f <lpfc_sli_validate_fcp_iocb+143>: nop
0xffffffffa0683110 <lpfc_sli_validate_fcp_iocb+144>: mov -0x58(%rdi),%rax
0xffffffffa0683114 <lpfc_sli_validate_fcp_iocb+148>: mov (%rax),%rax
0xffffffffa0683117 <lpfc_sli_validate_fcp_iocb+151>: test %rax,%rax
...
%RDI - 0x58 == 0xFFFF88027D5A1A18
crash> rd 0xFFFF88027D5A1A18 20
ffff88027d5a1a18: 2e362e322f6e6f68 32356266650a0d39 hon/2.6.9..efb52
ffff88027d5a1a28: 2f6563697665642f 2f302f7375623278 /device/x2bus/0/
ffff88027d5a1a38: 00646e656b636162 ffff88027aaa0000 backend....z....
ffff88027d5a1a48: 0000002143ee2000 ffff88027aaa1f40 . [email protected]....
>
> Reviewed-by Laurence Oberman <[email protected]>
--
Johannes Thumshirn Storage
[email protected] +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html