On Wed, Jul 08, 2015 at 06:00:41PM -0700, Dan Williams wrote:
> On Wed, Jul 8, 2015 at 5:49 PM, Dmitry Torokhov
> <dmitry.torok...@gmail.com> wrote:
> > On Wed, Jul 08, 2015 at 05:36:04PM -0700, Dan Williams wrote:
> >> On Mon, Jul 6, 2015 at 4:40 PM, Dmitry Torokhov
> >> <dmitry.torok...@gmail.com> wrote:
> >> > On Tue, Jul 07, 2015 at 01:23:15AM +0200, Luis R. Rodriguez wrote:
> >> >> On Sat, Jul 04, 2015 at 07:09:19AM -0700, Dan Williams wrote:
> >> >> > On Fri, Jul 3, 2015 at 11:30 AM, Luis R. Rodriguez <mcg...@suse.com> 
> >> >> > wrote:
> >> >> > > On Sat, Jun 27, 2015 at 04:45:25PM -0700, Dan Williams wrote:
> >> >> > >> On Mon, Mar 30, 2015 at 4:20 PM, Dmitry Torokhov
> >> >> > >> <dmitry.torok...@gmail.com> wrote:
> >> >> > >> > Some devices take a long time when initializing, and not all 
> >> >> > >> > drivers are
> >> >> > >> > suited to initialize their devices when they are open. For 
> >> >> > >> > example,
> >> >> > >> > input drivers need to interrogate their devices in order to 
> >> >> > >> > publish
> >> >> > >> > device's capabilities before userspace will open them. When such 
> >> >> > >> > drivers
> >> >> > >> > are compiled into kernel they may stall entire kernel 
> >> >> > >> > initialization.
> >> >> > >> >
> >> >> > >> > This change allows drivers request for their probe functions to 
> >> >> > >> > be
> >> >> > >> > called asynchronously during driver and device registration 
> >> >> > >> > (manual
> >> >> > >> > binding is still synchronous). Because async_schedule is used to 
> >> >> > >> > perform
> >> >> > >> > asynchronous calls module loading will still wait for the 
> >> >> > >> > probing to
> >> >> > >> > complete.
> >> >> > >> >
> >> >> > >> > Note that the end goal is to make the probing asynchronous by 
> >> >> > >> > default,
> >> >> > >> > so annotating drivers with PROBE_PREFER_ASYNCHRONOUS is a 
> >> >> > >> > temporary
> >> >> > >> > measure that allows us to speed up boot process while we 
> >> >> > >> > validating and
> >> >> > >> > fixing the rest of the drivers and preparing userspace.
> >> >> > >> >
> >> >> > >> > This change is based on earlier patch by "Luis R. Rodriguez"
> >> >> > >> > <mcg...@suse.com>
> >> >> > >> >
> >> >> > >> > Signed-off-by: Dmitry Torokhov <dmitry.torok...@gmail.com>
> >> >> > >> > ---
> >> >> > >> >  drivers/base/base.h    |   1 +
> >> >> > >> >  drivers/base/bus.c     |  31 +++++++---
> >> >> > >> >  drivers/base/dd.c      | 149 
> >> >> > >> > ++++++++++++++++++++++++++++++++++++++++++-------
> >> >> > >> >  include/linux/device.h |  28 ++++++++++
> >> >> > >> >  4 files changed, 182 insertions(+), 27 deletions(-)
> >> >> > >>
> >> >> > >> Just noticed this patch.  It caught my eye because I had a hard 
> >> >> > >> time
> >> >> > >> getting an open coded implementation of asynchronous probing to 
> >> >> > >> work
> >> >> > >> in the new libnvdimm subsystem.  Especially the messy races of 
> >> >> > >> tearing
> >> >> > >> things down while probing is still in flight.  I ended up 
> >> >> > >> implementing
> >> >> > >> asynchronous device registration which eliminated a lot of 
> >> >> > >> complexity
> >> >> > >> and of course the bugs.  In general I tend to think that async
> >> >> > >> registration is less risky than async probe since it keeps wider
> >> >> > >> portions of the traditional device model synchronous
> >> >> > >
> >> >> > > but its not see -DEFER_PROBE even before async probe.
> >> >> >
> >> >> > Except in that case you know probe has been seen by the driver at
> >> >> > least once.  So I see that as less of a surprise, but point taken.
> >> >> >
> >> >> > >> and leverages the
> >> >> > >> fact that the device model is already well prepared for 
> >> >> > >> asynchronous
> >> >> > >> arrival of devices due to hotplug.
> >> >> > >
> >> >> > > I think this sounds reasonable, do you have your code upstream or 
> >> >> > > posted?
> >> >> >
> >> >> > Yes, see nd_device_register() in drivers/nvdimm/bus.c
> >> >>
> >> >> It should be I think rather easy for Dmitry to see if he can convert 
> >> >> this input
> >> >> driver (not yet upstream) to this API and see if the same issues are 
> >> >> fixed.
> >> >
> >> > No, I would rather not as it means we lose error handling on device
> >> > registration.
> >> >
> >>
> >> I think this is a red herring as I don't see how async probing is any
> >> better at handling device registration errors.  The error is logged
> >> and "handled" by the fact that a device fails to appear, what other
> >> action would you take?  In fact libnvdimm does detect registration
> >> failures and reports that in a parent device attribute (at least for a
> >> region device and their namespace child devices).
> >
> > What is libnvdimm behavior if you try to unload a module that tries to
> > register a device but it failed? Memory leak or crash, right?
> 
> No, in the case of the "region" driver it is part of the core
> libnvdimm and it is pinned while any region device is active.

No, not quite. Let's take a look for example at nd_btt_probe(). It calls
__nd_btt_create() which in turn calls __nd_device_register() which
returns void and asynchronously schedules device registration. Now
consider the device registration fails. The async code will drop 2
references to the device, effectively freeing it. In the mean time
nd_btt_probe() stores the device pointer which may or may no longer be
valid and goes on it's merry way using it.

The similar thing in nvdimm_create which returns a pointer that may no
longer be valid. I have not traced enough through the code to make sure
if it can blow up, but this kind of situation is not desirable,
especially if the async registration pattern is applied generally
throughout the kernel.

Thanks.

-- 
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to