On Wed, Nov 20 2013, Dave Chinner wrote:
> On Tue, Nov 19, 2013 at 07:02:30PM -0700, Jens Axboe wrote:
> > On Tue, Nov 19 2013, Jens Axboe wrote:
> > > > Looks like a race condition, below works for me, please try.
> > > > 
> > > > 
> > > > Subject: virtio_blk: fix race condition
> > > > 
> > > > virtqueue_kick() isn't multi-thread safe.
> > > > 
> > > > Signed-off-by: Shaohua Li <s...@fusionio.com>
> > > > 
> > > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > > > index 588479d..f353959 100644
> > > > --- a/drivers/block/virtio_blk.c
> > > > +++ b/drivers/block/virtio_blk.c
> > > > @@ -204,10 +204,11 @@ static int virtio_queue_rq(struct blk_mq_hw_ctx 
> > > > *hctx, struct request *req)
> > > >                 virtqueue_kick(vblk->vq);
> > > >                 return BLK_MQ_RQ_QUEUE_BUSY;
> > > >         }
> > > > -       spin_unlock_irqrestore(&vblk->vq_lock, flags);
> > > >  
> > > >         if (last)
> > > >                 virtqueue_kick(vblk->vq);
> > > > +       spin_unlock_irqrestore(&vblk->vq_lock, flags);
> > > > +
> > > >         return BLK_MQ_RQ_QUEUE_OK;
> > > >  }
> > > 
> > > Just stumbled on that too. You need one more, btw, for the sg failure
> > > case:
> > > 
> > > 
> > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > > index 588479d58f52..6a680d4de7f1 100644
> > > --- a/drivers/block/virtio_blk.c
> > > +++ b/drivers/block/virtio_blk.c
> > > @@ -199,15 +199,16 @@ static int virtio_queue_rq(struct blk_mq_hw_ctx 
> > > *hctx, struct request *req)
> > >  
> > >   spin_lock_irqsave(&vblk->vq_lock, flags);
> > >   if (__virtblk_add_req(vblk->vq, vbr, vbr->sg, num) < 0) {
> > > +         virtqueue_kick(vblk->vq);
> > >           spin_unlock_irqrestore(&vblk->vq_lock, flags);
> > >           blk_mq_stop_hw_queue(hctx);
> > > -         virtqueue_kick(vblk->vq);
> > >           return BLK_MQ_RQ_QUEUE_BUSY;
> > >   }
> > > - spin_unlock_irqrestore(&vblk->vq_lock, flags);
> > >  
> > >   if (last)
> > >           virtqueue_kick(vblk->vq);
> > > +
> > > + spin_unlock_irqrestore(&vblk->vq_lock, flags);
> > >   return BLK_MQ_RQ_QUEUE_OK;
> > >  }
> > 
> > Tested successfully here too.
> 
> Ah, so it is exactly the problem I suggested it might be. ;)

It isn't actually, it's not a race between the queue conditions, the
stopping/starting etc or inside/outside lock state checking. It's a
"simple" race between the virtqueue operations. It is a race, however,
but I think that one was given :-)

> > Dave, please give it a go, looks like this
> > should fix it up for you. Committed here:
> > 
> > http://git.kernel.dk/?p=linux-block.git;a=commit;h=f02b9ac35a47dff745c7637fbc095f01cc03646e
> 
> Testing it now. might take a little while to confirm given it had
> taken a few iterations of xfstests before I tripped over it...

I feel pretty confident in it, fwiw. My test case was boiled down to
trigger it in seconds, and it survived a lengthy run afterwards.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to