Hello,

Minchan, a quick question. just to avoid resends and to make sure that I'm
not missing any better solution.


lockdep is unhappy here:

> -static void zram_remove(struct zram *zram)
> +static int zram_remove(struct zram *zram)
>  {
> -     pr_info("Removed device: %s\n", zram->disk->disk_name);
> +     struct block_device *bdev;
> +
> +     bdev = bdget_disk(zram->disk, 0);
> +     if (!bdev)
> +             return -ENOMEM;
> +
> +     mutex_lock(&bdev->bd_mutex);
> +     if (bdev->bd_openers) {
> +             mutex_unlock(&bdev->bd_mutex);
> +             return -EBUSY;
> +     }
> +
>       /*
>        * Remove sysfs first, so no one will perform a disksize
> -      * store while we destroy the devices
> +      * store while we destroy the devices. This also helps during
> +      * zram_remove() -- device_reset() is the last holder of
> +      * ->init_lock.
>        */
>       sysfs_remove_group(&disk_to_dev(zram->disk)->kobj,
>                       &zram_disk_attr_group);
>  
> +     /* Make sure all pending I/O is finished */
> +     fsync_bdev(bdev);
>       zram_reset_device(zram);
> +     mutex_unlock(&bdev->bd_mutex);
> +
> +     pr_info("Removed device: %s\n", zram->disk->disk_name);
> +
>       idr_remove(&zram_index_idr, zram->disk->first_minor);
>       blk_cleanup_queue(zram->disk->queue);
>       del_gendisk(zram->disk);
>       put_disk(zram->disk);
>       kfree(zram);
> +
> +     return 0;
>  }


lock ->bd_mutex
        if ->bd_openers {
                unlock ->bd_mutex
                return -EBUSY;
        }

        sysfs_remove_group()
                ^^^^^^^^^ lockdep splat
        zram_reset_device()
                lock ->init_lock
                reset device
                unlock ->init_lock
unlock ->bd_mutex
...
kfree zram


why did I do this:

sysfs_remove_group() turns zram_reset_device() into the last possible 
->init_lock
owner: there are no sysfs nodes before final zram_reset_device(), so no
concurrent nor later store()/show() sysfs handler can be called. it closes a 
number
of race conditions, like:

        CPU0                    CPU1
        umount
        zram_remove()
        zram_reset_device()     disksize_store()
                                mount
        kfree zram

or

        CPU0                            CPU1
        umount
        zram_remove()
        zram_reset_device()
                                        cat /sys/block/zram0/_any_sysfs_node_
        sysfs_remove_group()
        kfree zram                      _any_sysfs_node_read()


and so on. so removing sysfs group before zram_reset_device() makes sense.

at the same time we need to prevent `umount-zram_remove vs. mount' race and 
forbid
zram_remove() on active device. so we check ->bd_openers and perform device 
reset
under ->bd_mutex.


a quick solution I can think of is to do something like this:

sysfs_remove_group();
lock ->bd_mutex
        if ->bd_openers {
                unlock ->bd_mutex
                create_sysfs_group()
                        ^^^^^^^^   return attrs back
                return -EBUSY
        }

        zram_reset_device()
                lock ->init_lock
                reset device
                unlock ->init_lock
unlock ->bd_mutex
...
kfree zram


iow, move sysfs_remove_group() out of ->bd_mutex lock, but keep it
before ->bd_openers check & zram_reset_device().

I don't think that we can handle it with only ->init_lock. we need to unlock
it at some point and kfree zram that contains that lock. and we have no idea
are there any lock owners or waiters when we kfree zram. removing sysfs group
and acquiring ->init_lock for write in zram_reset_device() guarantee that
there will no further ->init_lock owners and we can safely kfree after
`unlock ->init_lock`. hence, sysfs_remove_group() must happen before
zram_reset_device().


create_sysfs_group() potentially can fail. which is a bit ugly. but user
will be able to umount device and remove it anyway.


what do you think?

        -ss
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to