On Mon, Jul 18, 2016 at 04:28:57PM -0400, Laurence Oberman wrote:
> 
> 
> ----- Original Message -----
> > From: "Johannes Thumshirn" <[email protected]>
> > To: "Martin K . Petersen" <[email protected]>, "James Bottomley" 
> > <[email protected]>, "James Smart"
> > <[email protected]>, "Dick Kennedy" <[email protected]>
> > Cc: "Linux SCSI Mailinglist" <[email protected]>, "Johannes 
> > Thumshirn" <[email protected]>
> > Sent: Monday, July 18, 2016 10:06:03 AM
> > Subject: [PATCH 1/2] lpfc: call lpfc_sli_validate_fcp_iocb() with the 
> > hbalock held
> > 
> > Call lpfc_sli_validate_fcp_iocb() with the hbalock held, as the pointer to
> > iocbq is not guaranteed to still be valid after looking it up.
> > 
> > Signed-off-by: Johannes Thumshirn <[email protected]>
> > ---
> >  drivers/scsi/lpfc/lpfc_sli.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
> > index 1a248a2..d58ec4c 100644
> > --- a/drivers/scsi/lpfc/lpfc_sli.c
> > +++ b/drivers/scsi/lpfc/lpfc_sli.c
> > @@ -9978,6 +9978,7 @@ lpfc_sli_sum_iocb(struct lpfc_vport *vport, uint16_t
> > tgt_id, uint64_t lun_id,
> >     struct lpfc_iocbq *iocbq;
> >     int sum, i;
> >  
> > +   spin_lock_irq(&phba->hbalock);
> >     for (i = 1, sum = 0; i <= phba->sli.last_iotag; i++) {
> >             iocbq = phba->sli.iocbq_lookup[i];
> >  
> > @@ -9985,6 +9986,7 @@ lpfc_sli_sum_iocb(struct lpfc_vport *vport, uint16_t
> > tgt_id, uint64_t lun_id,
> >                                             ctx_cmd) == 0)
> >                     sum++;
> >     }
> > +   spin_unlock_irq(&phba->hbalock);
> >  
> >     return sum;
> >  }
> > --
> > 1.8.5.6
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> > the body of a message to [email protected]
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> Hi Johannes
> 
> The code looks correct, but I would get a review from James Smart or Dick 
> Kennedy at Emulex (now Broadcom :))

Sure, that's why they're on Cc ;-).

> Have you experienced an issue already due to this, can you share the stack 
> trace of the issue if you have it.

We have had an issue at a customer's side. It's actually pretty hard to
trigger (it involves two separated SANs and several FC switches in between and
link bounces) but the core dumps show that in lpfc_sli_validate_fcp_iocb() the
iocbq address in invalid. It actually _is_ a valid kernel pointer and _was_ a
valid 'struct lpfc_iocbq' but isn't anymore. So we suggest a use-after-free.
This patch is running OK withing one of our partner's QA sites for 3 weeks now
(we've been chasing it since October).

c.f this part of our partner's analysis:
crash> bt
PID: 62174  TASK: ffff880285fe40c0  CPU: 0   COMMAND: "kworker/0:1"
 #0 [ffff88027dfb7bc0] crash_kexec at ffffffff8008e4ca
 #1 [ffff88027dfb7c90] oops_end at ffffffff804129a8
 #2 [ffff88027dfb7cb0] general_protection at ffffffff80411d28
    [exception RIP: lpfc_sli_validate_fcp_iocb+148]
    RIP: ffffffffa0683114  RSP: ffff88027dfb7d60  RFLAGS: 00010246
    RAX: 2e362e322f6e6f68  RBX: 0000000000000000  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: ffff8803c2417678  RDI: ffff88027d5a1a70
    RBP: ffff8803d8c28000   R8: 0000000000000001   R9: 0000000000000000
    R10: ffffffffa06f7898  R11: 0000000000000000  R12: 0000000000001340
    R13: ffff8803c2ac2000  R14: ffff8803c2417678  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: e030  SS: e02b
 #3 [ffff88027dfb7d68] lpfc_sli_abort_iocb at ffffffffa06831bb [lpfc]
 #4 [ffff88027dfb7dc8] fc_terminate_rport_io at ffffffffa0502aea 
[scsi_transport_fc]
 #5 [ffff88027dfb7de8] fc_timeout_deleted_rport at ffffffffa050338c 
[scsi_transport_fc]
 #6 [ffff88027dfb7e28] process_one_work at ffffffff80062ea8
 #7 [ffff88027dfb7e78] worker_thread at ffffffff800668aa
 #8 [ffff88027dfb7ee8] kthread at ffffffff8006a506
 #9 [ffff88027dfb7f48] kernel_thread_helper at ffffffff8041a244


crash> dis lpfc_sli_validate_fcp_iocb
...
0xffffffffa068310f <lpfc_sli_validate_fcp_iocb+143>:    nop
0xffffffffa0683110 <lpfc_sli_validate_fcp_iocb+144>:    mov    -0x58(%rdi),%rax
0xffffffffa0683114 <lpfc_sli_validate_fcp_iocb+148>:    mov    (%rax),%rax
0xffffffffa0683117 <lpfc_sli_validate_fcp_iocb+151>:    test   %rax,%rax
...

%RDI - 0x58  == 0xFFFF88027D5A1A18

crash>  rd 0xFFFF88027D5A1A18 20
ffff88027d5a1a18:  2e362e322f6e6f68 32356266650a0d39   hon/2.6.9..efb52
ffff88027d5a1a28:  2f6563697665642f 2f302f7375623278   /device/x2bus/0/
ffff88027d5a1a38:  00646e656b636162 ffff88027aaa0000   backend....z....
ffff88027d5a1a48:  0000002143ee2000 ffff88027aaa1f40   . [email protected]....

> 
> Reviewed-by Laurence Oberman <[email protected]>

-- 
Johannes Thumshirn                                          Storage
[email protected]                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to