On Fri, Sep 11, 2020 at 11:20:17PM +0800, Hillf Danton wrote:
> 
> On Fri, 11 Sep 2020 08:57:50 -0300 Jason Gunthorpe wrote:
> > On Fri, Sep 11, 2020 at 12:16:40PM +0800, Hillf Danton wrote:
> > > Detect race destroying ctx in order to avoid UAF.
> > > 
> > > +++ b/drivers/infiniband/core/ucma.c
> > > @@ -625,6 +625,10 @@ static ssize_t ucma_destroy_id(struct uc
> > >           return PTR_ERR(ctx);
> > >  
> > >   mutex_lock(&ctx->file->mut);
> > > + if (ctx->destroying == 1) {
> > > +         mutex_unlock(&ctx->file->mut);
> > > +         return -ENXIO;
> > > + }
> > >   ctx->destroying = 1;
> > >   mutex_unlock(&ctx->file->mut);
> > >  
> > > @@ -1826,6 +1830,8 @@ static int ucma_close(struct inode *inod
> > >  
> > >   mutex_lock(&file->mut);
> > >   list_for_each_entry_safe(ctx, tmp, &file->ctx_list, list) {
> > > +         if (ctx->destroying == 1)
> > > +                 continue;
> > >           ctx->destroying = 1;
> > >           mutex_unlock(&file->mut);
> > >  
> > 
> > ucma_destroy_id() is called from write() and ucma_close is release(),
> > so there is no way these can race?
> 
> Sound good but what's reported is uaf in the close path, which is
> impossible without another thread releasing the ctx a step ahead
> the closer.
> Can we call it a race if that's true?

Migrate is the cause, very tricky:

                CPU0                      CPU1
        ucma_destroy_id()
                                  ucma_migrate_id()
                                       ucma_get_ctx()
        xa_lock()
         _ucma_find_context()
         xa_erase()
                                       xa_lock()
                                        ctx->file = new_file
                                        list_move()
                                       xa_unlock()
                                      ucma_put_ctx
                                   ucma_close()
                                      _destroy_id()

        _destroy_id()
          wait_for_completion()
          // boom


ie the destrory_id() on the initial FD captures the ctx right before
migrate moves it, then the new FD closes calling destroy while the
other destroy is still running.

Sigh, I will rewrite migrate too..

Jason

Reply via email to