On Sat, Aug 13, 2016 at 8:23 AM, James Bottomley
<james.bottom...@hansenpartnership.com> wrote:
> On Fri, 2016-08-12 at 21:57 -0700, Dan Williams wrote:
>> On Fri, Aug 12, 2016 at 5:29 PM, Dan Williams <
>> dan.j.willi...@intel.com> wrote:
>> > On Fri, Aug 12, 2016 at 5:17 PM, James Bottomley
>> > <james.bottom...@hansenpartnership.com> wrote:
>> > > On Fri, 2016-08-12 at 14:29 -0700, Dan Williams wrote:
>> > > > Before spending effort trying to flush the destruction of old
>> > > > bdi
>> > > > instances before new ones are registered, is it rather time to
>> > > > complete the conversion of sd to only use dynamically allocated
>> > > > devt?
>> > >
>> > > Do we have to go that far?  Surely your fix is extensible: the
>> > > only
>> > > reason it doesn't work for us is that the gendisk holds the
>> > > parent
>> > > without a reference, so we can free the SCSI device before its
>> > > child
>> > > gendisk (good job no-one actually uses gendisk->parent after
>> > > we've
>> > > released it ...).  If we fix that it would mean SCSI can't
>> > > release the
>> > > sdev until after the queue is dead and the bdi namespace
>> > > released, so
>> > > isn't something like this the easy fix?
>> > >
>> > > James
>> > >
>> > > ---
>> > >
>> > > diff --git a/block/genhd.c b/block/genhd.c
>> > > index fcd6d4f..54ae4ae 100644
>> > > --- a/block/genhd.c
>> > > +++ b/block/genhd.c
>> > > @@ -514,7 +514,7 @@ static void register_disk(struct device
>> > > *parent, struct gendisk *disk)
>> > >         struct hd_struct *part;
>> > >         int err;
>> > >
>> > > -       ddev->parent = parent;
>> > > +       ddev->parent = get_device(parent);
>> > >
>> > >         dev_set_name(ddev, "%s", disk->disk_name);
>> > >
>> > > @@ -1144,6 +1144,7 @@ static void disk_release(struct device
>> > > *dev)
>> > >         hd_free_part(&disk->part0);
>> > >         if (disk->queue)
>> > >                 blk_put_queue(disk->queue);
>> > > +       put_device(dev->parent);
>> > >         kfree(disk);
>> > >  }
>> > >  struct class block_class = {
>> >
>> > Looks ok at first glance to me.
>> >
>> > We do hold a reference on the parent device, but it gets dropped at
>> > device_unregister() time and this moves it out to the final put.
>
> We do?  Where?

Yes, register_disk() does "ddev->parent = parent" and then
"device_add(ddev)".  device_add() takes the parent reference.

>
>> > However, this does leave static devt block-device-drivers that
>> > register a disk without a parent device susceptible to the race...
>> > I think those exist given all the drivers still using add_disk()
>> > after commit 52c44d93c26f "block: remove ->driverfs_dev".
>
> It does?  The race is the fact that the parent can be removed before
> the child meaning if the parent name is re-registered before the child
> dies we get a duplicate name in bdi space.

No, the race is that the *name* of the parent isn't released until the
child is both unregistered and put.  The device core is already
ensuring that the parent is not released until all descendants have
been removed.

>
>> So I tried the attached and it makes the libnvdimm unit tests start
>> crashing.
>
> Well, the attached is clearly buggy, isn't it?  You're trying to do a
> get on the parent before the parent is actually set.

Ah, yes, thank you.  Fixed up v2 attached that passes my tests.

> Why don't you
> just try the incremental patch I sent instead of trying to rework it?

I reworked it because it is the bdi that holds this extra dependency
on the disk's parent, not the disk itself.

>
>>   A couple crash logs attached.  Not yet sure what assumption
>> is getting violated, but how about that conversion of scsi to use
>> dynamic devt? ;-)
>
> It's completely orthogonal.  The problem is in hierarchy lifetimes:
> switching from static to dynamic allocation won't change that at all.
>  You don't see this problem in nvme because the parent control device's
> lifetime belongs to the controller not the disk.  In SCSI the parent is
> our representation of the SCSI device whose lifetime is governed at the
> SCSI level and effectively represents the disk.
>

No, it's only the name.  We could achieve the same by teaching the
block core to manage the "sd_index_ida" instead of the sd driver
itself, but the v2-patch attached works and does not introduce that
layering violation.

Attachment: patch-v2
Description: Binary data

Reply via email to