Hello Sergey,

On Wed, Apr 29, 2015 at 09:16:24AM +0900, Sergey Senozhatsky wrote:
> Hello,
> 
> Minchan, a quick question. just to avoid resends and to make sure that I'm
> not missing any better solution.
> 
> 
> lockdep is unhappy here:
> 
> > -static void zram_remove(struct zram *zram)
> > +static int zram_remove(struct zram *zram)
> >  {
> > -   pr_info("Removed device: %s\n", zram->disk->disk_name);
> > +   struct block_device *bdev;
> > +
> > +   bdev = bdget_disk(zram->disk, 0);
> > +   if (!bdev)
> > +           return -ENOMEM;
> > +
> > +   mutex_lock(&bdev->bd_mutex);
> > +   if (bdev->bd_openers) {
> > +           mutex_unlock(&bdev->bd_mutex);
> > +           return -EBUSY;
> > +   }
> > +
> >     /*
> >      * Remove sysfs first, so no one will perform a disksize
> > -    * store while we destroy the devices
> > +    * store while we destroy the devices. This also helps during
> > +    * zram_remove() -- device_reset() is the last holder of
> > +    * ->init_lock.
> >      */
> >     sysfs_remove_group(&disk_to_dev(zram->disk)->kobj,
> >                     &zram_disk_attr_group);
> >  
> > +   /* Make sure all pending I/O is finished */
> > +   fsync_bdev(bdev);
> >     zram_reset_device(zram);
> > +   mutex_unlock(&bdev->bd_mutex);
> > +
> > +   pr_info("Removed device: %s\n", zram->disk->disk_name);
> > +
> >     idr_remove(&zram_index_idr, zram->disk->first_minor);
> >     blk_cleanup_queue(zram->disk->queue);
> >     del_gendisk(zram->disk);
> >     put_disk(zram->disk);
> >     kfree(zram);
> > +
> > +   return 0;
> >  }
> 
> 
> lock ->bd_mutex
>       if ->bd_openers {
>               unlock ->bd_mutex
>               return -EBUSY;
>       }
> 
>       sysfs_remove_group()
>               ^^^^^^^^^ lockdep splat
>       zram_reset_device()
>               lock ->init_lock
>               reset device
>               unlock ->init_lock
> unlock ->bd_mutex
> ...
> kfree zram
> 
> 
> why did I do this:
> 
> sysfs_remove_group() turns zram_reset_device() into the last possible 
> ->init_lock
> owner: there are no sysfs nodes before final zram_reset_device(), so no
> concurrent nor later store()/show() sysfs handler can be called. it closes a 
> number
> of race conditions, like:
> 
>       CPU0                    CPU1
>       umount
>       zram_remove()
>       zram_reset_device()     disksize_store()
>                               mount
>       kfree zram
> 
> or
> 
>       CPU0                            CPU1
>       umount
>       zram_remove()
>       zram_reset_device()
>                                       cat /sys/block/zram0/_any_sysfs_node_
>       sysfs_remove_group()
>       kfree zram                      _any_sysfs_node_read()
> 
> 
> and so on. so removing sysfs group before zram_reset_device() makes sense.
> 
> at the same time we need to prevent `umount-zram_remove vs. mount' race and 
> forbid
> zram_remove() on active device. so we check ->bd_openers and perform device 
> reset
> under ->bd_mutex.
> 

Could you explain in detail about unmount-zram_remove vs. mount race?
I guess it should be done in upper layer(e,g. VFS). Anyway, I want to be
more clear about that.

> 
> a quick solution I can think of is to do something like this:
> 
> sysfs_remove_group();
> lock ->bd_mutex
>       if ->bd_openers {
>               unlock ->bd_mutex
>               create_sysfs_group()
>                       ^^^^^^^^   return attrs back
>               return -EBUSY
>       }
> 
>       zram_reset_device()
>               lock ->init_lock
>               reset device
>               unlock ->init_lock
> unlock ->bd_mutex
> ...
> kfree zram
> 
> 
> iow, move sysfs_remove_group() out of ->bd_mutex lock, but keep it
> before ->bd_openers check & zram_reset_device().
> 
> I don't think that we can handle it with only ->init_lock. we need to unlock
> it at some point and kfree zram that contains that lock. and we have no idea
> are there any lock owners or waiters when we kfree zram. removing sysfs group
> and acquiring ->init_lock for write in zram_reset_device() guarantee that
> there will no further ->init_lock owners and we can safely kfree after
> `unlock ->init_lock`. hence, sysfs_remove_group() must happen before
> zram_reset_device().
> 
> 
> create_sysfs_group() potentially can fail. which is a bit ugly. but user
> will be able to umount device and remove it anyway.
> 
> 
> what do you think?
> 
>       -ss

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to