Re: [linux-pm] Fundamental flaw in system suspend, exposed by freezer removal

2008-02-26 Thread Alan Stern
On Wed, 27 Feb 2008, David Newall wrote:

> David Brownell wrote:
> > On Tuesday 26 February 2008, David Newall wrote:
> >   
> >> Hardware can be inserted and removed while we're in a suspend state; and
> >> there's nothing that we can do about it until we resume.  Is it fair to
> >> say, then, that having started suspend, we could reasonably ignore any
> >> device insertion and removal, and handle it on resume?
> >> 
> >
> > "Ignore" seems a bit strong; those events may be wakeup triggers,
> > which would cause the hardware to make it a very short suspend state.
> >
> > "Defer handling" is more to the point, be it by hardware or software.
> >
> >   
> 
> Of course, "defer".  The insertion has to be handled eventually.  What
> I'm wondering is if we can ignore it, and catch it on the resume.

Certainly.  If hardware-change events can get lost because of the
system sleep, the resume method should make every effort to verify that 
what it remembers of the hardware state matches the current reality.

> >> Presumably we need to scan for hardware changes on resume.
> >> 
> >
> > Not on most busses I work with; the hardware issues notifications
> > whenever the devices are removable.
> >   
> 
> There's no notification while we're suspended.  Isn't it necessary to
> scan all busses on resume, just to know what's on them?

It depends on the bus.  If the bus doesn't support hotplugging then 
scanning isn't necessary.  If the bus does support hotplugging then 
scanning after suspend may or may not be necessary, depending on 
whether or not the bus controller remained powered during the suspend.  
For hotpluggable buses, scanning after hibernation is always necessary.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Fundamental flaw in system suspend, exposed by freezer removal

2008-02-26 Thread Alan Stern
On Tue, 26 Feb 2008, Rafael J. Wysocki wrote:

> > > IMO the device driver should assure that no new children will be 
> > > registered
> > > concurrently with the ->suspend() method (IOW, ->suspend() should wait for
> > > all such registrations to complete and should prevent any new ones from
> > > being started) and it should make it impossible to register any new 
> > > children
> > > after ->suspend() has run.  It's the driver's problem how to achieve that.
> > 
> > Exactly; this has to be added to the PM documentation.
> 
> Into Documentation/power/devices.txt, I gather?

Yes.

> > > > The PM core could help detect errors here.  If it tries to suspend a 
> > > > device and sees that the device's parent is already suspended, then the 
> > > > parent's driver has a bug.
> > > 
> > > Yes, I think we ought to fail the suspend in such cases.  Still, that's 
> > > not
> > > sufficient to prevent a child from being registered after we've run
> > > dpm_suspend().  For this reason, we could also leave dpm_suspend() with
> > > dpm_list_mtx held and not release it until the next dpm_resume() is run.
> > 
> > The pm_sleep_rwsem will do a better job of catching such errors.
> 
> But we should not leave a window between releasing dpm_list_mtx and taking
> pm_sleep_rwsem.  Either that, or we should make sure that dpm_active is
> empty after acquiring pm_sleep_rwsem.

I've got some ideas on how to implement this.

We can add a new field "suspend_called" to dev->power.  It would be
owned by the PM core (protect by dpm_list_mtx) and read-only to
drivers.  Normally it will contain 0, but when the suspend method is
running we set it to SUSPEND_RUNNING and when the method returns
successfully we set it to SUSPEND_DONE.  Before calling the resume
method we set it back to 0.  Drivers can use this field as an easy way
of checking that all the child devices have been suspended.

When a new device is registered we check its parent's suspend_called
value.  If it is SUSPEND_DONE then the caller has a bug and we have to
fail the registration.  If it is SUSPEND_RUNNING then the registration
is legal, but we remember what happened.  Then when the
currently-running suspend method returns and we reacquire the
dpm_list_mtx, we will realize that a race was lost.  If the method
completed successfully (which it shouldn't) we can resume that device
immediately without ever taking it off the dpm_active list; but either
way we should continue the suspend loop.  Now the new child will be at
the end of the dpm_active_list, so it will be suspended before the
parent is reached again.

This way we can recover from drivers that are willing to suspend their 
device even though there are unsuspended children.  The only drawback 
will be that for a short time the child will be active while its parent 
is suspended.

We should not abort the entire sleep transition simply because we lost 
a race.  With this scheme we won't even need the pm_sleep_rwsem; the 
dpm_list_mtx will provide all the necessary protection.

This is more intricate than it should be.  It would have been better to
have had "disable_new_children" and "enable_new_children" methods from
the beginning; then there wouldn't be any races at all.  That's life...

The one tricky thing to watch out for is when a suspend or resume 
method wants to unregister the device being suspended or resumed.  Even 
that should be doable (set suspend_called to UNREGISTERED or something 
like that).

> > > That will potentially cause some trouble to CPU hotplug cotifiers, but we 
> > > can
> > > handle that, for example, by using the in_suspend_context() test.
> > 
> > Do they need to register new CPUs at some point?  There ought to be a 
> > way to handle that.
> 
> No, they don't, but there are some CPU-related device objects that get
> uregistered/registered.  Still, all of this work is really redundant if the 
> CPU
> in question comes back up during the resume, so it should be avoided in
> general.  The CPU hotplug notifiers should only unregister those objects if
> the CPU hasn't gone on line during the resume and they have all information
> necessary for discovering that.

Unregistration should always be allowed, and registration should be 
allowed whenever the parent isn't suspended.  For devices with no 
parent, we can imagine there is a fictitious parent at the root of the 
device tree.  Conceptually it gets suspended after every real device 
and resumed before.  Maybe even before dpm_power_up(), meaning that 
devices with no parent could be registered by a resume_early method.

When your lock-removal stuff gets into Greg's tree, I'll write all 
this.  Sound good?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Fundamental flaw in system suspend, exposed by freezer removal

2008-02-26 Thread David Newall
David Brownell wrote:
> On Tuesday 26 February 2008, David Newall wrote:
>   
>> Hardware can be inserted and removed while we're in a suspend state; and
>> there's nothing that we can do about it until we resume.  Is it fair to
>> say, then, that having started suspend, we could reasonably ignore any
>> device insertion and removal, and handle it on resume?
>> 
>
> "Ignore" seems a bit strong; those events may be wakeup triggers,
> which would cause the hardware to make it a very short suspend state.
>
> "Defer handling" is more to the point, be it by hardware or software.
>
>   

Of course, "defer".  The insertion has to be handled eventually.  What
I'm wondering is if we can ignore it, and catch it on the resume.


>> Presumably we need to scan for hardware changes on resume.
>> 
>
> Not on most busses I work with; the hardware issues notifications
> whenever the devices are removable.
>   

There's no notification while we're suspended.  Isn't it necessary to
scan all busses on resume, just to know what's on them?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Fundamental flaw in system suspend, exposed by freezer removal

2008-02-26 Thread David Brownell
On Tuesday 26 February 2008, David Newall wrote:
> 
> Hardware can be inserted and removed while we're in a suspend state; and
> there's nothing that we can do about it until we resume.  Is it fair to
> say, then, that having started suspend, we could reasonably ignore any
> device insertion and removal, and handle it on resume?

"Ignore" seems a bit strong; those events may be wakeup triggers,
which would cause the hardware to make it a very short suspend state.

"Defer handling" is more to the point, be it by hardware or software.


> Presumably we need to scan for hardware changes on resume.

Not on most busses I work with; the hardware issues notifications
whenever the devices are removable.

Which is a Good Thing ... scanning, as you suggest, is inherently
not reliable.  If a mouse is swapped, it likely doesn't matter a
lot whether it's the "same" or not.

But ... how about if some removable storage media was taken out,
updated on a different system, then restored before the resume?
Not many filesystems handle that sanely.  You *really* want to
have hardware which reports disconnect/reconnect events, or which
physically prevents them (locked case etc).

- Dave



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Fundamental flaw in system suspend, exposed by freezer removal

2008-02-26 Thread David Newall
David Brownell wrote:
> This "flaw" isn't a new thing, of course.  I remember pointing out the rather
> annoying proclivity of the PM framework to deadlock when suspend() tried to
> remove USB devices ... back around 2.6.10 or so.  Things have shuffled around
> a bit, and gotten better in some cases, but not fundamentally changed.

Hardware can be inserted and removed while we're in a suspend state; and
there's nothing that we can do about it until we resume.  Is it fair to
say, then, that having started suspend, we could reasonably ignore any
device insertion and removal, and handle it on resume?  Presumably we
need to scan for hardware changes on resume.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Fundamental flaw in system suspend, exposed by freezer removal

2008-02-26 Thread David Newall
David Brownell wrote:
 This flaw isn't a new thing, of course.  I remember pointing out the rather
 annoying proclivity of the PM framework to deadlock when suspend() tried to
 remove USB devices ... back around 2.6.10 or so.  Things have shuffled around
 a bit, and gotten better in some cases, but not fundamentally changed.

Hardware can be inserted and removed while we're in a suspend state; and
there's nothing that we can do about it until we resume.  Is it fair to
say, then, that having started suspend, we could reasonably ignore any
device insertion and removal, and handle it on resume?  Presumably we
need to scan for hardware changes on resume.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Fundamental flaw in system suspend, exposed by freezer removal

2008-02-26 Thread David Brownell
On Tuesday 26 February 2008, David Newall wrote:
 
 Hardware can be inserted and removed while we're in a suspend state; and
 there's nothing that we can do about it until we resume.  Is it fair to
 say, then, that having started suspend, we could reasonably ignore any
 device insertion and removal, and handle it on resume?

Ignore seems a bit strong; those events may be wakeup triggers,
which would cause the hardware to make it a very short suspend state.

Defer handling is more to the point, be it by hardware or software.


 Presumably we need to scan for hardware changes on resume.

Not on most busses I work with; the hardware issues notifications
whenever the devices are removable.

Which is a Good Thing ... scanning, as you suggest, is inherently
not reliable.  If a mouse is swapped, it likely doesn't matter a
lot whether it's the same or not.

But ... how about if some removable storage media was taken out,
updated on a different system, then restored before the resume?
Not many filesystems handle that sanely.  You *really* want to
have hardware which reports disconnect/reconnect events, or which
physically prevents them (locked case etc).

- Dave



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Fundamental flaw in system suspend, exposed by freezer removal

2008-02-26 Thread David Newall
David Brownell wrote:
 On Tuesday 26 February 2008, David Newall wrote:
   
 Hardware can be inserted and removed while we're in a suspend state; and
 there's nothing that we can do about it until we resume.  Is it fair to
 say, then, that having started suspend, we could reasonably ignore any
 device insertion and removal, and handle it on resume?
 

 Ignore seems a bit strong; those events may be wakeup triggers,
 which would cause the hardware to make it a very short suspend state.

 Defer handling is more to the point, be it by hardware or software.

   

Of course, defer.  The insertion has to be handled eventually.  What
I'm wondering is if we can ignore it, and catch it on the resume.


 Presumably we need to scan for hardware changes on resume.
 

 Not on most busses I work with; the hardware issues notifications
 whenever the devices are removable.
   

There's no notification while we're suspended.  Isn't it necessary to
scan all busses on resume, just to know what's on them?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Fundamental flaw in system suspend, exposed by freezer removal

2008-02-26 Thread Alan Stern
On Tue, 26 Feb 2008, Rafael J. Wysocki wrote:

   IMO the device driver should assure that no new children will be 
   registered
   concurrently with the -suspend() method (IOW, -suspend() should wait for
   all such registrations to complete and should prevent any new ones from
   being started) and it should make it impossible to register any new 
   children
   after -suspend() has run.  It's the driver's problem how to achieve that.
  
  Exactly; this has to be added to the PM documentation.
 
 Into Documentation/power/devices.txt, I gather?

Yes.

The PM core could help detect errors here.  If it tries to suspend a 
device and sees that the device's parent is already suspended, then the 
parent's driver has a bug.
   
   Yes, I think we ought to fail the suspend in such cases.  Still, that's 
   not
   sufficient to prevent a child from being registered after we've run
   dpm_suspend().  For this reason, we could also leave dpm_suspend() with
   dpm_list_mtx held and not release it until the next dpm_resume() is run.
  
  The pm_sleep_rwsem will do a better job of catching such errors.
 
 But we should not leave a window between releasing dpm_list_mtx and taking
 pm_sleep_rwsem.  Either that, or we should make sure that dpm_active is
 empty after acquiring pm_sleep_rwsem.

I've got some ideas on how to implement this.

We can add a new field suspend_called to dev-power.  It would be
owned by the PM core (protect by dpm_list_mtx) and read-only to
drivers.  Normally it will contain 0, but when the suspend method is
running we set it to SUSPEND_RUNNING and when the method returns
successfully we set it to SUSPEND_DONE.  Before calling the resume
method we set it back to 0.  Drivers can use this field as an easy way
of checking that all the child devices have been suspended.

When a new device is registered we check its parent's suspend_called
value.  If it is SUSPEND_DONE then the caller has a bug and we have to
fail the registration.  If it is SUSPEND_RUNNING then the registration
is legal, but we remember what happened.  Then when the
currently-running suspend method returns and we reacquire the
dpm_list_mtx, we will realize that a race was lost.  If the method
completed successfully (which it shouldn't) we can resume that device
immediately without ever taking it off the dpm_active list; but either
way we should continue the suspend loop.  Now the new child will be at
the end of the dpm_active_list, so it will be suspended before the
parent is reached again.

This way we can recover from drivers that are willing to suspend their 
device even though there are unsuspended children.  The only drawback 
will be that for a short time the child will be active while its parent 
is suspended.

We should not abort the entire sleep transition simply because we lost 
a race.  With this scheme we won't even need the pm_sleep_rwsem; the 
dpm_list_mtx will provide all the necessary protection.

This is more intricate than it should be.  It would have been better to
have had disable_new_children and enable_new_children methods from
the beginning; then there wouldn't be any races at all.  That's life...

The one tricky thing to watch out for is when a suspend or resume 
method wants to unregister the device being suspended or resumed.  Even 
that should be doable (set suspend_called to UNREGISTERED or something 
like that).

   That will potentially cause some trouble to CPU hotplug cotifiers, but we 
   can
   handle that, for example, by using the in_suspend_context() test.
  
  Do they need to register new CPUs at some point?  There ought to be a 
  way to handle that.
 
 No, they don't, but there are some CPU-related device objects that get
 uregistered/registered.  Still, all of this work is really redundant if the 
 CPU
 in question comes back up during the resume, so it should be avoided in
 general.  The CPU hotplug notifiers should only unregister those objects if
 the CPU hasn't gone on line during the resume and they have all information
 necessary for discovering that.

Unregistration should always be allowed, and registration should be 
allowed whenever the parent isn't suspended.  For devices with no 
parent, we can imagine there is a fictitious parent at the root of the 
device tree.  Conceptually it gets suspended after every real device 
and resumed before.  Maybe even before dpm_power_up(), meaning that 
devices with no parent could be registered by a resume_early method.

When your lock-removal stuff gets into Greg's tree, I'll write all 
this.  Sound good?

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Fundamental flaw in system suspend, exposed by freezer removal

2008-02-26 Thread Alan Stern
On Wed, 27 Feb 2008, David Newall wrote:

 David Brownell wrote:
  On Tuesday 26 February 2008, David Newall wrote:

  Hardware can be inserted and removed while we're in a suspend state; and
  there's nothing that we can do about it until we resume.  Is it fair to
  say, then, that having started suspend, we could reasonably ignore any
  device insertion and removal, and handle it on resume?
  
 
  Ignore seems a bit strong; those events may be wakeup triggers,
  which would cause the hardware to make it a very short suspend state.
 
  Defer handling is more to the point, be it by hardware or software.
 

 
 Of course, defer.  The insertion has to be handled eventually.  What
 I'm wondering is if we can ignore it, and catch it on the resume.

Certainly.  If hardware-change events can get lost because of the
system sleep, the resume method should make every effort to verify that 
what it remembers of the hardware state matches the current reality.

  Presumably we need to scan for hardware changes on resume.
  
 
  Not on most busses I work with; the hardware issues notifications
  whenever the devices are removable.

 
 There's no notification while we're suspended.  Isn't it necessary to
 scan all busses on resume, just to know what's on them?

It depends on the bus.  If the bus doesn't support hotplugging then 
scanning isn't necessary.  If the bus does support hotplugging then 
scanning after suspend may or may not be necessary, depending on 
whether or not the bus controller remained powered during the suspend.  
For hotpluggable buses, scanning after hibernation is always necessary.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Fundamental flaw in system suspend, exposed by freezer removal

2008-02-25 Thread David Brownell
This "flaw" isn't a new thing, of course.  I remember pointing out the rather
annoying proclivity of the PM framework to deadlock when suspend() tried to
remove USB devices ... back around 2.6.10 or so.  Things have shuffled around
a bit, and gotten better in some cases, but not fundamentally changed.

It may be more accurate to say that now we understand some constraints on
device tree management policies ... ones we had previously assumed should
not be issues.  (But AFAICT, without actually considering the question.
Now we know the right question to ask!)


On Monday 25 February 2008, Rafael J. Wysocki wrote:
> IMO the device driver should assure that no new children will be registered
> concurrently with the ->suspend() method (IOW, ->suspend() should wait for
> all such registrations to complete and should prevent any new ones from
> being started) and it should make it impossible to register any new children
> after ->suspend() has run.  It's the driver's problem how to achieve that.

There's also the case where it's framework code that handles the additions
rather than the parent device.  That would be typical for many bridge, hub,
or adapter type drivers ... you may be thinking mostly about drivers acting
as "leaf" nodes in the device tree, at least in terms of real hardware nodes.

Yes, "require that policy from such framework code too".  Just trying to be
sure the description doesn't have gaping holes in the middle.  :)

I can think of a bunch of serial busses where framework code has that sort
of responsiblity.  USB, SPI, I2C ... "legacy" I2C drivers would all need
to be taught not to create/remove children during those intervals, lacking
"new style" conversion (which makes them work more like normal drivers).

- Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Fundamental flaw in system suspend, exposed by freezer removal

2008-02-25 Thread Rafael J. Wysocki
On Tuesday, 26 of February 2008, Alan Stern wrote:
> On Mon, 25 Feb 2008, Rafael J. Wysocki wrote:
> 
> > > But when a system sleep begins, the PM core is expected to suspend
> > > all the children of a device before calling the device driver's suspend
> > > method.  If there are other threads trying to add new children at the
> > > same time, it's the PM core's responsibility to synchronize with
> > > them -- an impossible job, since only the device's driver knows what
> > > those other threads are and how to stop them safely.
> > 
> > It's not a problem if new children are registered before the parent's
> > ->suspend() is called, the PM core can handle that.  The problem is the
> > potential race between the suspending task and the threads registering new
> > children concurrently to the executing ->suspend(), because if those threads
> > lose the race, the resume ordering will be broken.
> 
> Not only that, the new children will exist temporarily in a state where 
> their parent is suspended and they are not.  Clearly we cannot allow 
> children to be registered below suspended parents.
> 
> > Since the PM core knows nothing about the drivers internals, the drivers'
> > ->suspend() methods must be responsible for synchronizing with the other
> > threads used by the driver.
> 
> This is a new requirement.  We didn't have to worry about it with the 
> freezer.  Driver maintainers need to know about it.

Certainly.  That's why I've been saying that it's not really a simple task to
get rid of the freezer. ;-)

> > I think we just attempted to take device semaphores too early.  We probably
> > can take the device semaphores _after_ suspending all devices without
> > much hassle.
> 
> At that stage there isn't any real need to hold the semaphores.  And it 
> complicates unregistration, which should always be allowed.  At the 
> moment I don't see any benefit to locking all the semaphores.

Agreed.

> >  However, it's not actually a problem if a suspended device
> > gets unregistered - it's removed from the list on which it is at the moment
> > and won't be resumed.  It also is not a problem if the device is registered
> > after it's master's ->resume() has run.
> 
> Correct.
> 
> > Besides, taking the semaphores for all _existing_ devices doesn't prevent
> > new devices from being added and that's we needed to take
> > pm_sleep_rwsem in device_add().
> 
> Yes.  We could keep it, but not acquire it until after all devices have 
> been suspended.  This might catch some errors.

Agreed.

> > IMO the device driver should assure that no new children will be registered
> > concurrently with the ->suspend() method (IOW, ->suspend() should wait for
> > all such registrations to complete and should prevent any new ones from
> > being started) and it should make it impossible to register any new children
> > after ->suspend() has run.  It's the driver's problem how to achieve that.
> 
> Exactly; this has to be added to the PM documentation.

Into Documentation/power/devices.txt, I gather?

> The difficulty arises only for drivers that support hotplugging, that 
> is, detecting and registering children from somewhere other than their 
> probe method.

Yes.

> > > The PM core could help detect errors here.  If it tries to suspend a 
> > > device and sees that the device's parent is already suspended, then the 
> > > parent's driver has a bug.
> > 
> > Yes, I think we ought to fail the suspend in such cases.  Still, that's not
> > sufficient to prevent a child from being registered after we've run
> > dpm_suspend().  For this reason, we could also leave dpm_suspend() with
> > dpm_list_mtx held and not release it until the next dpm_resume() is run.
> 
> The pm_sleep_rwsem will do a better job of catching such errors.

But we should not leave a window between releasing dpm_list_mtx and taking
pm_sleep_rwsem.  Either that, or we should make sure that dpm_active is
empty after acquiring pm_sleep_rwsem.
 
> > That will potentially cause some trouble to CPU hotplug cotifiers, but we 
> > can
> > handle that, for example, by using the in_suspend_context() test.
> 
> Do they need to register new CPUs at some point?  There ought to be a 
> way to handle that.

No, they don't, but there are some CPU-related device objects that get
uregistered/registered.  Still, all of this work is really redundant if the CPU
in question comes back up during the resume, so it should be avoided in
general.  The CPU hotplug notifiers should only unregister those objects if
the CPU hasn't gone on line during the resume and they have all information
necessary for discovering that.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Fundamental flaw in system suspend, exposed by freezer removal

2008-02-25 Thread Alan Stern
On Mon, 25 Feb 2008, Rafael J. Wysocki wrote:

> > But when a system sleep begins, the PM core is expected to suspend
> > all the children of a device before calling the device driver's suspend
> > method.  If there are other threads trying to add new children at the
> > same time, it's the PM core's responsibility to synchronize with
> > them -- an impossible job, since only the device's driver knows what
> > those other threads are and how to stop them safely.
> 
> It's not a problem if new children are registered before the parent's
> ->suspend() is called, the PM core can handle that.  The problem is the
> potential race between the suspending task and the threads registering new
> children concurrently to the executing ->suspend(), because if those threads
> lose the race, the resume ordering will be broken.

Not only that, the new children will exist temporarily in a state where 
their parent is suspended and they are not.  Clearly we cannot allow 
children to be registered below suspended parents.

> Since the PM core knows nothing about the drivers internals, the drivers'
> ->suspend() methods must be responsible for synchronizing with the other
> threads used by the driver.

This is a new requirement.  We didn't have to worry about it with the 
freezer.  Driver maintainers need to know about it.

> I think we just attempted to take device semaphores too early.  We probably
> can take the device semaphores _after_ suspending all devices without
> much hassle.

At that stage there isn't any real need to hold the semaphores.  And it 
complicates unregistration, which should always be allowed.  At the 
moment I don't see any benefit to locking all the semaphores.

>  However, it's not actually a problem if a suspended device
> gets unregistered - it's removed from the list on which it is at the moment
> and won't be resumed.  It also is not a problem if the device is registered
> after it's master's ->resume() has run.

Correct.

> Besides, taking the semaphores for all _existing_ devices doesn't prevent
> new devices from being added and that's we needed to take
> pm_sleep_rwsem in device_add().

Yes.  We could keep it, but not acquire it until after all devices have 
been suspended.  This might catch some errors.

> IMO the device driver should assure that no new children will be registered
> concurrently with the ->suspend() method (IOW, ->suspend() should wait for
> all such registrations to complete and should prevent any new ones from
> being started) and it should make it impossible to register any new children
> after ->suspend() has run.  It's the driver's problem how to achieve that.

Exactly; this has to be added to the PM documentation.

The difficulty arises only for drivers that support hotplugging, that 
is, detecting and registering children from somewhere other than their 
probe method.

> > The PM core could help detect errors here.  If it tries to suspend a 
> > device and sees that the device's parent is already suspended, then the 
> > parent's driver has a bug.
> 
> Yes, I think we ought to fail the suspend in such cases.  Still, that's not
> sufficient to prevent a child from being registered after we've run
> dpm_suspend().  For this reason, we could also leave dpm_suspend() with
> dpm_list_mtx held and not release it until the next dpm_resume() is run.

The pm_sleep_rwsem will do a better job of catching such errors.

> That will potentially cause some trouble to CPU hotplug cotifiers, but we can
> handle that, for example, by using the in_suspend_context() test.

Do they need to register new CPUs at some point?  There ought to be a 
way to handle that.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Fundamental flaw in system suspend, exposed by freezer removal

2008-02-25 Thread Rafael J. Wysocki
On Monday, 25 of February 2008, Alan Stern wrote:
> On Mon, 25 Feb 2008, Alan Stern wrote:
> 
> > The only possible solution is to have the drivers themselves be
> > responsible for preventing calls to device_add() or device_register()  
> > during a system sleep.  (It's also necessary to prevent driver binding,
> > but this isn't a major issue.)  The most straightforward approach is to
> > add a new pair of driver methods: one to disable adding children and
> > one to re-enable it.  Of course this would represent a significant
> > addition to the Power Management driver interface.
> > 
> > (Note that the existing suspend and resume methods cannot be used for 
> > this purpose.  Drivers assume that when the suspend method is called, 
> > it has already been called for all the child devices.  This wouldn't be 
> > true if one of the purposes of the method was to prevent addition of 
> > new children.)
> 
> On further thought maybe the existing methods can be used, with care.  
> Drivers would have to assume the responsibility of synchronizing with
> their helper threads and stopping addition of new children (something
> they should already be doing), and they would also have to check that
> all the existing children are already suspended.  They should not make
> the assumption that the PM core has already suspended all the children.

IMO the device driver should assure that no new children will be registered
concurrently with the ->suspend() method (IOW, ->suspend() should wait for
all such registrations to complete and should prevent any new ones from
being started) and it should make it impossible to register any new children
after ->suspend() has run.  It's the driver's problem how to achieve that.

> The PM core could help detect errors here.  If it tries to suspend a 
> device and sees that the device's parent is already suspended, then the 
> parent's driver has a bug.

Yes, I think we ought to fail the suspend in such cases.  Still, that's not
sufficient to prevent a child from being registered after we've run
dpm_suspend().  For this reason, we could also leave dpm_suspend() with
dpm_list_mtx held and not release it until the next dpm_resume() is run.

That will potentially cause some trouble to CPU hotplug cotifiers, but we can
handle that, for example, by using the in_suspend_context() test.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fundamental flaw in system suspend, exposed by freezer removal

2008-02-25 Thread Rafael J. Wysocki
On Monday, 25 of February 2008, Alan Stern wrote:
> Ongoing efforts to remove the freezer from the system suspend and
> hibernation code ("system sleep" is the proper catch-all term) have
> turned up a fundamental flaw in the Power Management subsystem's
> design.  In brief, we cannot handle the race between hotplug addition
> of new devices and suspending all existing devices.
> 
> It's not a simple problem (and I'm going to leave out a lot of details
> here).  For a comparison, think about what happens when a device is
> hot-unplugged.  When device_del() calls the driver's remove method, the
> driver is expected to manage all the details of synchronizing with
> other threads that may be trying to add new child devices as well as
> removing all existing children.
> 
> But when a system sleep begins, the PM core is expected to suspend
> all the children of a device before calling the device driver's suspend
> method.  If there are other threads trying to add new children at the
> same time, it's the PM core's responsibility to synchronize with
> them -- an impossible job, since only the device's driver knows what
> those other threads are and how to stop them safely.

It's not a problem if new children are registered before the parent's
->suspend() is called, the PM core can handle that.  The problem is the
potential race between the suspending task and the threads registering new
children concurrently to the executing ->suspend(), because if those threads
lose the race, the resume ordering will be broken.

Since the PM core knows nothing about the drivers internals, the drivers'
->suspend() methods must be responsible for synchronizing with the other
threads used by the driver.

> In the past this deficiency has been hidden by the freezer.  Other 
> tasks couldn't register new children because they were frozen.  But now 
> we are phasing out the freezer (already most kernel threads are not 
> freezable) and the problem is starting to show up.
> 
> A change to the PM core present in 2.6.25-rc2 (but which is about to be
> reverted!) has the core try to prevent these additions by acquiring the
> device semaphores for every registered device.  This has turned out to
> be too heavy-handed; for example, it prevents drivers from
> unregistering devices during a system sleep.  There are more subtle
> synchronization problems as well.

I think we just attempted to take device semaphores too early.  We probably
can take the device semaphores _after_ suspending all devices without
much hassle.  However, it's not actually a problem if a suspended device
gets unregistered - it's removed from the list on which it is at the moment
and won't be resumed.  It also is not a problem if the device is registered
after it's master's ->resume() has run.

Besides, taking the semaphores for all _existing_ devices doesn't prevent
new devices from being added and that's we needed to take
pm_sleep_rwsem in device_add().

> The only possible solution is to have the drivers themselves be
> responsible for preventing calls to device_add() or device_register()  
> during a system sleep.  (It's also necessary to prevent driver binding,
> but this isn't a major issue.)  The most straightforward approach is to
> add a new pair of driver methods: one to disable adding children and
> one to re-enable it.  Of course this would represent a significant
> addition to the Power Management driver interface.

I'd rather not do that.

> (Note that the existing suspend and resume methods cannot be used for 
> this purpose.  Drivers assume that when the suspend method is called, 
> it has already been called for all the child devices.  This wouldn't be 
> true if one of the purposes of the method was to prevent addition of 
> new children.)
> 
> Another way of accomplishing this is to require drivers to pay
> attention to pm_notifier chain and stop registering children when any
> of the PM_xxx_PREPARE messages is sent.  This approach feels a lot more
> awkward to me.

As I said above, I don't see a problem with registering new devices before
the parent's ->suspend() is run and after it's ->resume() has run.  That
doesn't break any ordering rules and we can handle it.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Fundamental flaw in system suspend, exposed by freezer removal

2008-02-25 Thread Alan Stern
On Mon, 25 Feb 2008, Alan Stern wrote:

> The only possible solution is to have the drivers themselves be
> responsible for preventing calls to device_add() or device_register()  
> during a system sleep.  (It's also necessary to prevent driver binding,
> but this isn't a major issue.)  The most straightforward approach is to
> add a new pair of driver methods: one to disable adding children and
> one to re-enable it.  Of course this would represent a significant
> addition to the Power Management driver interface.
> 
> (Note that the existing suspend and resume methods cannot be used for 
> this purpose.  Drivers assume that when the suspend method is called, 
> it has already been called for all the child devices.  This wouldn't be 
> true if one of the purposes of the method was to prevent addition of 
> new children.)

On further thought maybe the existing methods can be used, with care.  
Drivers would have to assume the responsibility of synchronizing with
their helper threads and stopping addition of new children (something
they should already be doing), and they would also have to check that
all the existing children are already suspended.  They should not make
the assumption that the PM core has already suspended all the children.

The PM core could help detect errors here.  If it tries to suspend a 
device and sees that the device's parent is already suspended, then the 
parent's driver has a bug.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Fundamental flaw in system suspend, exposed by freezer removal

2008-02-25 Thread Alan Stern
Ongoing efforts to remove the freezer from the system suspend and
hibernation code ("system sleep" is the proper catch-all term) have
turned up a fundamental flaw in the Power Management subsystem's
design.  In brief, we cannot handle the race between hotplug addition
of new devices and suspending all existing devices.

It's not a simple problem (and I'm going to leave out a lot of details
here).  For a comparison, think about what happens when a device is
hot-unplugged.  When device_del() calls the driver's remove method, the
driver is expected to manage all the details of synchronizing with
other threads that may be trying to add new child devices as well as
removing all existing children.

But when a system sleep begins, the PM core is expected to suspend
all the children of a device before calling the device driver's suspend
method.  If there are other threads trying to add new children at the
same time, it's the PM core's responsibility to synchronize with
them -- an impossible job, since only the device's driver knows what
those other threads are and how to stop them safely.

In the past this deficiency has been hidden by the freezer.  Other 
tasks couldn't register new children because they were frozen.  But now 
we are phasing out the freezer (already most kernel threads are not 
freezable) and the problem is starting to show up.

A change to the PM core present in 2.6.25-rc2 (but which is about to be
reverted!) has the core try to prevent these additions by acquiring the
device semaphores for every registered device.  This has turned out to
be too heavy-handed; for example, it prevents drivers from
unregistering devices during a system sleep.  There are more subtle
synchronization problems as well.

The only possible solution is to have the drivers themselves be
responsible for preventing calls to device_add() or device_register()  
during a system sleep.  (It's also necessary to prevent driver binding,
but this isn't a major issue.)  The most straightforward approach is to
add a new pair of driver methods: one to disable adding children and
one to re-enable it.  Of course this would represent a significant
addition to the Power Management driver interface.

(Note that the existing suspend and resume methods cannot be used for 
this purpose.  Drivers assume that when the suspend method is called, 
it has already been called for all the child devices.  This wouldn't be 
true if one of the purposes of the method was to prevent addition of 
new children.)

Another way of accomplishing this is to require drivers to pay
attention to pm_notifier chain and stop registering children when any
of the PM_xxx_PREPARE messages is sent.  This approach feels a lot more
awkward to me.

Comments and discussion?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Fundamental flaw in system suspend, exposed by freezer removal

2008-02-25 Thread Alan Stern
Ongoing efforts to remove the freezer from the system suspend and
hibernation code (system sleep is the proper catch-all term) have
turned up a fundamental flaw in the Power Management subsystem's
design.  In brief, we cannot handle the race between hotplug addition
of new devices and suspending all existing devices.

It's not a simple problem (and I'm going to leave out a lot of details
here).  For a comparison, think about what happens when a device is
hot-unplugged.  When device_del() calls the driver's remove method, the
driver is expected to manage all the details of synchronizing with
other threads that may be trying to add new child devices as well as
removing all existing children.

But when a system sleep begins, the PM core is expected to suspend
all the children of a device before calling the device driver's suspend
method.  If there are other threads trying to add new children at the
same time, it's the PM core's responsibility to synchronize with
them -- an impossible job, since only the device's driver knows what
those other threads are and how to stop them safely.

In the past this deficiency has been hidden by the freezer.  Other 
tasks couldn't register new children because they were frozen.  But now 
we are phasing out the freezer (already most kernel threads are not 
freezable) and the problem is starting to show up.

A change to the PM core present in 2.6.25-rc2 (but which is about to be
reverted!) has the core try to prevent these additions by acquiring the
device semaphores for every registered device.  This has turned out to
be too heavy-handed; for example, it prevents drivers from
unregistering devices during a system sleep.  There are more subtle
synchronization problems as well.

The only possible solution is to have the drivers themselves be
responsible for preventing calls to device_add() or device_register()  
during a system sleep.  (It's also necessary to prevent driver binding,
but this isn't a major issue.)  The most straightforward approach is to
add a new pair of driver methods: one to disable adding children and
one to re-enable it.  Of course this would represent a significant
addition to the Power Management driver interface.

(Note that the existing suspend and resume methods cannot be used for 
this purpose.  Drivers assume that when the suspend method is called, 
it has already been called for all the child devices.  This wouldn't be 
true if one of the purposes of the method was to prevent addition of 
new children.)

Another way of accomplishing this is to require drivers to pay
attention to pm_notifier chain and stop registering children when any
of the PM_xxx_PREPARE messages is sent.  This approach feels a lot more
awkward to me.

Comments and discussion?

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Fundamental flaw in system suspend, exposed by freezer removal

2008-02-25 Thread Alan Stern
On Mon, 25 Feb 2008, Alan Stern wrote:

 The only possible solution is to have the drivers themselves be
 responsible for preventing calls to device_add() or device_register()  
 during a system sleep.  (It's also necessary to prevent driver binding,
 but this isn't a major issue.)  The most straightforward approach is to
 add a new pair of driver methods: one to disable adding children and
 one to re-enable it.  Of course this would represent a significant
 addition to the Power Management driver interface.
 
 (Note that the existing suspend and resume methods cannot be used for 
 this purpose.  Drivers assume that when the suspend method is called, 
 it has already been called for all the child devices.  This wouldn't be 
 true if one of the purposes of the method was to prevent addition of 
 new children.)

On further thought maybe the existing methods can be used, with care.  
Drivers would have to assume the responsibility of synchronizing with
their helper threads and stopping addition of new children (something
they should already be doing), and they would also have to check that
all the existing children are already suspended.  They should not make
the assumption that the PM core has already suspended all the children.

The PM core could help detect errors here.  If it tries to suspend a 
device and sees that the device's parent is already suspended, then the 
parent's driver has a bug.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Fundamental flaw in system suspend, exposed by freezer removal

2008-02-25 Thread Rafael J. Wysocki
On Monday, 25 of February 2008, Alan Stern wrote:
 On Mon, 25 Feb 2008, Alan Stern wrote:
 
  The only possible solution is to have the drivers themselves be
  responsible for preventing calls to device_add() or device_register()  
  during a system sleep.  (It's also necessary to prevent driver binding,
  but this isn't a major issue.)  The most straightforward approach is to
  add a new pair of driver methods: one to disable adding children and
  one to re-enable it.  Of course this would represent a significant
  addition to the Power Management driver interface.
  
  (Note that the existing suspend and resume methods cannot be used for 
  this purpose.  Drivers assume that when the suspend method is called, 
  it has already been called for all the child devices.  This wouldn't be 
  true if one of the purposes of the method was to prevent addition of 
  new children.)
 
 On further thought maybe the existing methods can be used, with care.  
 Drivers would have to assume the responsibility of synchronizing with
 their helper threads and stopping addition of new children (something
 they should already be doing), and they would also have to check that
 all the existing children are already suspended.  They should not make
 the assumption that the PM core has already suspended all the children.

IMO the device driver should assure that no new children will be registered
concurrently with the -suspend() method (IOW, -suspend() should wait for
all such registrations to complete and should prevent any new ones from
being started) and it should make it impossible to register any new children
after -suspend() has run.  It's the driver's problem how to achieve that.

 The PM core could help detect errors here.  If it tries to suspend a 
 device and sees that the device's parent is already suspended, then the 
 parent's driver has a bug.

Yes, I think we ought to fail the suspend in such cases.  Still, that's not
sufficient to prevent a child from being registered after we've run
dpm_suspend().  For this reason, we could also leave dpm_suspend() with
dpm_list_mtx held and not release it until the next dpm_resume() is run.

That will potentially cause some trouble to CPU hotplug cotifiers, but we can
handle that, for example, by using the in_suspend_context() test.

Thanks,
Rafael
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fundamental flaw in system suspend, exposed by freezer removal

2008-02-25 Thread Rafael J. Wysocki
On Monday, 25 of February 2008, Alan Stern wrote:
 Ongoing efforts to remove the freezer from the system suspend and
 hibernation code (system sleep is the proper catch-all term) have
 turned up a fundamental flaw in the Power Management subsystem's
 design.  In brief, we cannot handle the race between hotplug addition
 of new devices and suspending all existing devices.
 
 It's not a simple problem (and I'm going to leave out a lot of details
 here).  For a comparison, think about what happens when a device is
 hot-unplugged.  When device_del() calls the driver's remove method, the
 driver is expected to manage all the details of synchronizing with
 other threads that may be trying to add new child devices as well as
 removing all existing children.
 
 But when a system sleep begins, the PM core is expected to suspend
 all the children of a device before calling the device driver's suspend
 method.  If there are other threads trying to add new children at the
 same time, it's the PM core's responsibility to synchronize with
 them -- an impossible job, since only the device's driver knows what
 those other threads are and how to stop them safely.

It's not a problem if new children are registered before the parent's
-suspend() is called, the PM core can handle that.  The problem is the
potential race between the suspending task and the threads registering new
children concurrently to the executing -suspend(), because if those threads
lose the race, the resume ordering will be broken.

Since the PM core knows nothing about the drivers internals, the drivers'
-suspend() methods must be responsible for synchronizing with the other
threads used by the driver.

 In the past this deficiency has been hidden by the freezer.  Other 
 tasks couldn't register new children because they were frozen.  But now 
 we are phasing out the freezer (already most kernel threads are not 
 freezable) and the problem is starting to show up.
 
 A change to the PM core present in 2.6.25-rc2 (but which is about to be
 reverted!) has the core try to prevent these additions by acquiring the
 device semaphores for every registered device.  This has turned out to
 be too heavy-handed; for example, it prevents drivers from
 unregistering devices during a system sleep.  There are more subtle
 synchronization problems as well.

I think we just attempted to take device semaphores too early.  We probably
can take the device semaphores _after_ suspending all devices without
much hassle.  However, it's not actually a problem if a suspended device
gets unregistered - it's removed from the list on which it is at the moment
and won't be resumed.  It also is not a problem if the device is registered
after it's master's -resume() has run.

Besides, taking the semaphores for all _existing_ devices doesn't prevent
new devices from being added and that's we needed to take
pm_sleep_rwsem in device_add().

 The only possible solution is to have the drivers themselves be
 responsible for preventing calls to device_add() or device_register()  
 during a system sleep.  (It's also necessary to prevent driver binding,
 but this isn't a major issue.)  The most straightforward approach is to
 add a new pair of driver methods: one to disable adding children and
 one to re-enable it.  Of course this would represent a significant
 addition to the Power Management driver interface.

I'd rather not do that.

 (Note that the existing suspend and resume methods cannot be used for 
 this purpose.  Drivers assume that when the suspend method is called, 
 it has already been called for all the child devices.  This wouldn't be 
 true if one of the purposes of the method was to prevent addition of 
 new children.)
 
 Another way of accomplishing this is to require drivers to pay
 attention to pm_notifier chain and stop registering children when any
 of the PM_xxx_PREPARE messages is sent.  This approach feels a lot more
 awkward to me.

As I said above, I don't see a problem with registering new devices before
the parent's -suspend() is run and after it's -resume() has run.  That
doesn't break any ordering rules and we can handle it.

Thanks,
Rafael
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Fundamental flaw in system suspend, exposed by freezer removal

2008-02-25 Thread Alan Stern
On Mon, 25 Feb 2008, Rafael J. Wysocki wrote:

  But when a system sleep begins, the PM core is expected to suspend
  all the children of a device before calling the device driver's suspend
  method.  If there are other threads trying to add new children at the
  same time, it's the PM core's responsibility to synchronize with
  them -- an impossible job, since only the device's driver knows what
  those other threads are and how to stop them safely.
 
 It's not a problem if new children are registered before the parent's
 -suspend() is called, the PM core can handle that.  The problem is the
 potential race between the suspending task and the threads registering new
 children concurrently to the executing -suspend(), because if those threads
 lose the race, the resume ordering will be broken.

Not only that, the new children will exist temporarily in a state where 
their parent is suspended and they are not.  Clearly we cannot allow 
children to be registered below suspended parents.

 Since the PM core knows nothing about the drivers internals, the drivers'
 -suspend() methods must be responsible for synchronizing with the other
 threads used by the driver.

This is a new requirement.  We didn't have to worry about it with the 
freezer.  Driver maintainers need to know about it.

 I think we just attempted to take device semaphores too early.  We probably
 can take the device semaphores _after_ suspending all devices without
 much hassle.

At that stage there isn't any real need to hold the semaphores.  And it 
complicates unregistration, which should always be allowed.  At the 
moment I don't see any benefit to locking all the semaphores.

  However, it's not actually a problem if a suspended device
 gets unregistered - it's removed from the list on which it is at the moment
 and won't be resumed.  It also is not a problem if the device is registered
 after it's master's -resume() has run.

Correct.

 Besides, taking the semaphores for all _existing_ devices doesn't prevent
 new devices from being added and that's we needed to take
 pm_sleep_rwsem in device_add().

Yes.  We could keep it, but not acquire it until after all devices have 
been suspended.  This might catch some errors.

 IMO the device driver should assure that no new children will be registered
 concurrently with the -suspend() method (IOW, -suspend() should wait for
 all such registrations to complete and should prevent any new ones from
 being started) and it should make it impossible to register any new children
 after -suspend() has run.  It's the driver's problem how to achieve that.

Exactly; this has to be added to the PM documentation.

The difficulty arises only for drivers that support hotplugging, that 
is, detecting and registering children from somewhere other than their 
probe method.

  The PM core could help detect errors here.  If it tries to suspend a 
  device and sees that the device's parent is already suspended, then the 
  parent's driver has a bug.
 
 Yes, I think we ought to fail the suspend in such cases.  Still, that's not
 sufficient to prevent a child from being registered after we've run
 dpm_suspend().  For this reason, we could also leave dpm_suspend() with
 dpm_list_mtx held and not release it until the next dpm_resume() is run.

The pm_sleep_rwsem will do a better job of catching such errors.

 That will potentially cause some trouble to CPU hotplug cotifiers, but we can
 handle that, for example, by using the in_suspend_context() test.

Do they need to register new CPUs at some point?  There ought to be a 
way to handle that.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Fundamental flaw in system suspend, exposed by freezer removal

2008-02-25 Thread Rafael J. Wysocki
On Tuesday, 26 of February 2008, Alan Stern wrote:
 On Mon, 25 Feb 2008, Rafael J. Wysocki wrote:
 
   But when a system sleep begins, the PM core is expected to suspend
   all the children of a device before calling the device driver's suspend
   method.  If there are other threads trying to add new children at the
   same time, it's the PM core's responsibility to synchronize with
   them -- an impossible job, since only the device's driver knows what
   those other threads are and how to stop them safely.
  
  It's not a problem if new children are registered before the parent's
  -suspend() is called, the PM core can handle that.  The problem is the
  potential race between the suspending task and the threads registering new
  children concurrently to the executing -suspend(), because if those threads
  lose the race, the resume ordering will be broken.
 
 Not only that, the new children will exist temporarily in a state where 
 their parent is suspended and they are not.  Clearly we cannot allow 
 children to be registered below suspended parents.
 
  Since the PM core knows nothing about the drivers internals, the drivers'
  -suspend() methods must be responsible for synchronizing with the other
  threads used by the driver.
 
 This is a new requirement.  We didn't have to worry about it with the 
 freezer.  Driver maintainers need to know about it.

Certainly.  That's why I've been saying that it's not really a simple task to
get rid of the freezer. ;-)

  I think we just attempted to take device semaphores too early.  We probably
  can take the device semaphores _after_ suspending all devices without
  much hassle.
 
 At that stage there isn't any real need to hold the semaphores.  And it 
 complicates unregistration, which should always be allowed.  At the 
 moment I don't see any benefit to locking all the semaphores.

Agreed.

   However, it's not actually a problem if a suspended device
  gets unregistered - it's removed from the list on which it is at the moment
  and won't be resumed.  It also is not a problem if the device is registered
  after it's master's -resume() has run.
 
 Correct.
 
  Besides, taking the semaphores for all _existing_ devices doesn't prevent
  new devices from being added and that's we needed to take
  pm_sleep_rwsem in device_add().
 
 Yes.  We could keep it, but not acquire it until after all devices have 
 been suspended.  This might catch some errors.

Agreed.

  IMO the device driver should assure that no new children will be registered
  concurrently with the -suspend() method (IOW, -suspend() should wait for
  all such registrations to complete and should prevent any new ones from
  being started) and it should make it impossible to register any new children
  after -suspend() has run.  It's the driver's problem how to achieve that.
 
 Exactly; this has to be added to the PM documentation.

Into Documentation/power/devices.txt, I gather?

 The difficulty arises only for drivers that support hotplugging, that 
 is, detecting and registering children from somewhere other than their 
 probe method.

Yes.

   The PM core could help detect errors here.  If it tries to suspend a 
   device and sees that the device's parent is already suspended, then the 
   parent's driver has a bug.
  
  Yes, I think we ought to fail the suspend in such cases.  Still, that's not
  sufficient to prevent a child from being registered after we've run
  dpm_suspend().  For this reason, we could also leave dpm_suspend() with
  dpm_list_mtx held and not release it until the next dpm_resume() is run.
 
 The pm_sleep_rwsem will do a better job of catching such errors.

But we should not leave a window between releasing dpm_list_mtx and taking
pm_sleep_rwsem.  Either that, or we should make sure that dpm_active is
empty after acquiring pm_sleep_rwsem.
 
  That will potentially cause some trouble to CPU hotplug cotifiers, but we 
  can
  handle that, for example, by using the in_suspend_context() test.
 
 Do they need to register new CPUs at some point?  There ought to be a 
 way to handle that.

No, they don't, but there are some CPU-related device objects that get
uregistered/registered.  Still, all of this work is really redundant if the CPU
in question comes back up during the resume, so it should be avoided in
general.  The CPU hotplug notifiers should only unregister those objects if
the CPU hasn't gone on line during the resume and they have all information
necessary for discovering that.

Thanks,
Rafael
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Fundamental flaw in system suspend, exposed by freezer removal

2008-02-25 Thread David Brownell
This flaw isn't a new thing, of course.  I remember pointing out the rather
annoying proclivity of the PM framework to deadlock when suspend() tried to
remove USB devices ... back around 2.6.10 or so.  Things have shuffled around
a bit, and gotten better in some cases, but not fundamentally changed.

It may be more accurate to say that now we understand some constraints on
device tree management policies ... ones we had previously assumed should
not be issues.  (But AFAICT, without actually considering the question.
Now we know the right question to ask!)


On Monday 25 February 2008, Rafael J. Wysocki wrote:
 IMO the device driver should assure that no new children will be registered
 concurrently with the -suspend() method (IOW, -suspend() should wait for
 all such registrations to complete and should prevent any new ones from
 being started) and it should make it impossible to register any new children
 after -suspend() has run.  It's the driver's problem how to achieve that.

There's also the case where it's framework code that handles the additions
rather than the parent device.  That would be typical for many bridge, hub,
or adapter type drivers ... you may be thinking mostly about drivers acting
as leaf nodes in the device tree, at least in terms of real hardware nodes.

Yes, require that policy from such framework code too.  Just trying to be
sure the description doesn't have gaping holes in the middle.  :)

I can think of a bunch of serial busses where framework code has that sort
of responsiblity.  USB, SPI, I2C ... legacy I2C drivers would all need
to be taught not to create/remove children during those intervals, lacking
new style conversion (which makes them work more like normal drivers).

- Dave

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/