On Thu, 2019-01-17 at 16:05 -0500, Douglas Gilbert wrote:
> On 2019-01-16 8:06 p.m., Bart Van Assche wrote:
> > On Wed, 2019-01-16 at 19:54 -0500, Douglas Gilbert wrote:
> > > On 2019-01-16 6:56 p.m., Bart Van Assche wrote:
> > > > On Wed, 2019-01-16 at 10:57 -0500, Douglas Gilbert wrote:
> > > > > The block layer assumes scsi_request:sense is always a valid
> > > > > pointer. This is set up once in scsi_mq_init_request() and the
> > > > > containing scsi_cmnd object is used often, being re-initialized
> > > > > by scsi_init_command(). That works unless some code re-purposes
> > > > > part of the scsi_cmnd object for something else. And that is
> > > > > what bidi handling does in scsi_mq_prep_fn(). The result is an
> > > > > oops at some later time when the partly overwritten object is
> > > > > re-used. The overwrite is from d285203cf647d but 'git blame'
> > > > > does not show removed code, so that commit may not be the
> > > > > culprit.
> > > > > 
> > > > > Signed-off-by: Douglas Gilbert <dgilb...@interlog.com>
> > > > > ---
> > > > > 
> > > > > This was found while injecting errors (thus generating sense data)
> > > > > into a sequence of bidi commands. At some later time the block
> > > > > layer blew up with a scsi_request::sense NULL dereference in
> > > > > sg_rq_end_io(). Without testing I'm confident the bsg driver,
> > > > > the osd ULD and exofs are exposed to this bug.
> > > > > 
> > > > >    drivers/scsi/scsi_lib.c | 1 +
> > > > >    1 file changed, 1 insertion(+)
> > > > > 
> > > > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > > > > index b13cc9288ba0..71259bd4040a 100644
> > > > > --- a/drivers/scsi/scsi_lib.c
> > > > > +++ b/drivers/scsi/scsi_lib.c
> > > > > @@ -1175,6 +1175,7 @@ void scsi_init_command(struct scsi_device *dev, 
> > > > > struct scsi_cmnd *cmd)
> > > > >    
> > > > >       cmd->device = dev;
> > > > >       cmd->sense_buffer = buf;
> > > > > +     cmd->req.sense = buf;
> > > > >       cmd->prot_sdb = prot;
> > > > >       cmd->flags = flags;
> > > > >       INIT_DELAYED_WORK(&cmd->abort_work, scmd_eh_abort_handler);
> > > > 
> > > > Hi Doug,
> > > > 
> > > > The description of this patch does not look correct to me. 
> > > > scsi_init_command()
> > > > does not overwrite the sense pointer. From the body of that function:
> > > > 
> > > >         /* zero out the cmd, except for the embedded scsi_request */
> > > >         memset((char *)cmd + sizeof(cmd->req), 0,
> > > >                 sizeof(*cmd) - sizeof(cmd->req) + 
> > > > dev->host->hostt->cmd_size);
> > > > 
> > > > It is not clear to me which code overwrites the sense pointer. I think 
> > > > that
> > > > needs to be figured out before discussion of this patch can continue.
> > > 
> > > Bart,
> > > Please re-read the explanation. The two sentences in the middle should 
> > > tell
> > > you that you are looking at the wrong memset().
> > > 
> > > And I'm waiting for the person who wrote the questionable code to comment.
> > > 
> > > 
> > > If you don't believe me, try sending a device reset to a scsi_debug 
> > > device.
> > > Then use sg_raw *** to send a XDWRITREAD(10) bidi command to the same 
> > > scsi_debug
> > > device, you should get a Unit Attention concerning that device reset. If
> > > you do, keep sending that bidi command. Make sure you have no important
> > > files open because that machine will lock solid. Basically what happens
> > > when a rq_end_io() callback de-references a NULL pointer. It looks like
> > > it has been there since 2014 and took me 2 days to find. Sorry that I 
> > > can't
> > > explain it in one simple sentence.
> > 
> > Hi Doug,
> > 
> > Is this perhaps the memset you are referring to?
> > 
> >             memset(bidi_sdb, 0, sizeof(struct scsi_data_buffer));
> 
> Yes.

Hi Doug,

I think it's worse than what you explained in your patch description: some code
interprets blk_mq_rq_to_pdu(cmd->request->next_rq) as a struct scsi_data_buffer,
e.g. scsi_mq_prep_fn(). Other code in the SCSI core interprets the same data
structure as a struct scsi_request, e.g. scsi_io_completion(). So I don't think
that your patch is sufficient. I have already started working on a patch series
that clears up this confusion.

Bart.

Reply via email to