http://opensolaris.org/jive/thread.jspa?threadID=85513&tstart=45
Thread: [pm-discuss] debugging
suspend/resume failures...
|
 |
|
|
|
[pm-discuss] debugging suspend/resume failures...
Posted: Dec 12, 2008 5:45
AM
|
|
So, I'm interested in ensuring that my driver properly suspend/resumes.
However, I'm having problems in that my platform doesn't resume
properly
from a suspend, even without my driver loaded.
Are there any hints that we can use to help us figure out how to debug
failures in resume? It would be helpful to have a developer's debugging
page for this stuff.
(And no, I cannot move to a supported platform debug, because the
driver
is for hardware that is on the motherboard. Can I please also take a
second to bemoan the lack of suspend/resume support in ATI framebuffer
drivers? I'm starting to believe that we need to have a facility to
execute the BIOS on the video board for these things...)
-- Garrett
_______________________________________________
pm-discuss mailing list
pm-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/pm-discuss
|
|
|
Posts: 142
From:
Registered: 3/10/05
|
|
|
Re: [pm-discuss] debugging suspend/resume
failures...
Posted: Dec 12, 2008
2:57 PM in response to: Garrett D'Amore
|
|
On Thu, 11 Dec 2008, Garrett D'Amore wrote:
> So, I'm interested in ensuring that my driver properly
suspend/resumes.
> However, I'm having problems in that my platform doesn't resume
properly
> from a suspend, even without my driver loaded.
>
> Are there any hints that we can use to help us figure out how to
debug
> failures in resume? It would be helpful to have a developer's
debugging
> page for this stuff.
Can you hook up a serial line, or better yet, a serial console?
Logging to the serial port is nearly all the way to power-off, and
early in power-on. And the serial console starts pretty early as
well.
And I will see about getting a debugging page (or maybe even a wiki)
started.
>
> (And no, I cannot move to a supported platform debug, because the
driver
> is for hardware that is on the motherboard. Can I please also take
a
> second to bemoan the lack of suspend/resume support in ATI
framebuffer
> drivers? I'm starting to believe that we need to have a facility
to
> execute the BIOS on the video board for these things...)
You can bemoan, but it may not help at all. I keep hearing that the
open source driver works, but I don't think there has been success
yet.
---- Randy
>
> -- Garrett
>
_______________________________________________
pm-discuss mailing list
pm-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/pm-discuss
|
|
|
|
|
|
Re: [pm-discuss] debugging suspend/resume
failures...
Posted: Dec 13, 2008
3:05 AM in response to: randyf
|
|
Garrett (and interested other folks),
There's a brief expose/tirade about resuscitating graphics hardware,
the problem of the VBIOS, and our more particular situation with ATI
graphics devices in-line below.
Prior to that, there are two high-level things that I might say about
debugging S/R at present:
1. There is a *great* deal of room for improvement in this. As Randy
describes, those of us on the team who developed the prototype (and the
initial stuff now integrated) essentially relied on the debug version
of
the kernel, mdb and the serial line to squirt debugging output to -
especially whilst in the lower reaches of the Suspend/Resume code. We
do most certainly need much better facilities - especially for use with
the production (non-debug) kernel.
Development of some dtrace probes specific to power management is
one consideration, and in the nearer term
I feel that we could benefit immediately even from some simple .d
scripts that just trace various appropriate function
boundaries in the CPR code, and by watching the attach and detach
routines in the various drivers. This would let
one at least see where things are getting. We do have uadmin 3 22
and a couple of other quick hacks, but those are
not the most robust or comprehensive of things.
One goal for expedient initial debugging (platform assessment) in my
opinion, would be to be able to make *single* trial run in which one
could determine ALL the devices on the given platform that don't appear
to support S/R. At present the S/R code (via uadmin 3 20) just tries to
*do* a Suspend, and will report an error and then unwind at the first
failing device. Even the uadmin 3 22 feature (which intends to
implement a software loopback - hence not calling the ACPI S3 method if
it does make it through all the devices' Suspend command successfully)
does actually invoke the Suspend command on each device, and will also
therefore unwind if a failure is seen. We probably need an improved
driver interface which allows us to determine (inquire) whether each
driver thinks it implements S/R, without our actually having to invoke
the Suspend command on each driver. This would allow rapid enumeration
of everything in the dev tree that has a supporting vs. non-supporting
driver and one could move on from there to try to eliminate those
devices and drivers that don't, while doing an actual S/R test on the
rest of the devices that think they do: Of course there can be bugs
even in drivers that think they do support the operation, when they are
run in a context they haven't seen before: We saw this recently with
the mpt driver (for a family of LSI Logic SCSI HBA's) when SAS vs.
parallel SCSI disks were plugged into it for example. The SAS code path
in the driver was different, and it didn't implement S/R (nor did it
return FAILURE unfortunately).
2. Another technique that can be handy (which we've used a bit) is to
*remove* the drivers for problematic
devices from the system while debugging the rest. Of course this is
no good if it's a critical/core device
such as the disk controller running disk with the root filesystem on
it or the like. But, one can bump out
certain problematic drivers which are not running core hardware,
such as audio, some of the USB devices,
and even graphics drivers (in some cases).
There's a boot-time option (using -B unload= I think -
Randy can correct me on this), or
one can simply move the driver aside temporarily (rename it so that
it won't be found and hence won't be
loaded during boot): You can either knock it out of
/etc/driver_aliases, or just go to the directory where it
happens to live (whether /kernel/drv, kernel/drv/amd64,
/platform/i86pc/kernel/drv, or
/platform/i86pc/kernel/drv/adm64) and rename it temporarily.
(Other remarks are in-line below)
-db
Randy Fishel wrote:
> On Thu, 11 Dec 2008, Garrett D'Amore wrote:
>
>
>> So, I'm interested in ensuring that my driver properly
suspend/resumes.
>> However, I'm having problems in that my platform doesn't
resume properly
>> from a suspend, even without my driver loaded.
>>
>> Are there any hints that we can use to help us figure out how
to debug
>> failures in resume? It would be helpful to have a developer's
debugging
>> page for this stuff.
>>
>
> Can you hook up a serial line, or better yet, a serial console?
> Logging to the serial port is nearly all the way to power-off, and
> early in power-on. And the serial console starts pretty early as
> well.
>
> And I will see about getting a debugging page (or maybe even a
wiki)
> started.
>
>
>> (And no, I cannot move to a supported platform debug, because
the driver
>> is for hardware that is on the motherboard. Can I please also
take a
>> second to bemoan the lack of suspend/resume support in ATI
framebuffer
>> drivers? I'm starting to believe that we need to have a
facility to
>> execute the BIOS on the video board for these things...)
>>
Yes, that sort of thing certainly is (and has been for several years
now) our desire. It would though, require a fundamental change in an
industry area where Sun has not historically had any influence or
participation. Problem is that the historical design (I'm reluctant to
use the term 'architecture') of the BIOS is such that it only expects
to
execute hardware initialization code (including that on option card
BIOSes such as the VBIOS on graphics cards) at power on reset. Its
design did not anticipate the need also to execute such code upon
resume
from S3: These power-related features now in the components and on the
hardware platform are relatively new things and represent a disruptor
at
the firmware level as well as further upstairs.
In fact, even if there were a hook to the VBIOS iniitalization code
that
the OS could get to, very often it could not be re-executed in any case
since it tends to rely on routines in the motherboard's BIOS, and we've
seen that various routines in the main BIOS also become unmapped after
the power-on reset sequence has been completed. Typically the
initialization entry points in the VBIOS are correspondingly unmapped
after POR.
This leaves us in the difficult situation [at present] that we have to
have a Solaris kernel driver that knows how to re-initialize the
graphics hardware in question from cold iron -- equivalent to what the
VBIOS does at POR.
In some cases we have grabbed a copy of the graphics card's VBIOS and
then interpret that in the Solaris device driver during the resume
operation.
Having spent several months to make one of these things work for the
ATI
RageXL chip, I can tell you that this is not a happy way to proceed.
Often the documentation is poor or non-existent (sometimes because some
of the vendors feel that that might reveal proprietary aspects of their
chip architecture or something), and even when documentation can be
procured, we have discovered that there are implementation bugs in some
of the chips which have sometimes been band-aided with subsequent
undocumented bits in the hardware which can be nearly impossible to
learn about. Such a thing in particular with the RageXL took us the
better part of a month to discover.
An excellent answer would be a change in the BIOS architecture such
that
the same hardware initialization code could be executed whilst coming
out of S3. That way we continue to have a sensible situation in which
those very low level aspects are supported by the option vendor and we
don't need to think about it in OS-land.
The other - poorer in my opinion, way to go is that we get the vendors'
support to provide an OS-specific device driver that knows how to do
the
right thing(s). We currently have this situation with nVidia, and since
they have a unified driver architecture, they don't have to provide us
a
different one for every graphics device they come out with. They just
keep the one unified one up to date.
ATI has been a bigger problem historically. First, because they didn't
at first have a unified architecture and hence single device driver
capability. They how do have that I understand (since R300 I believe),
but we don't yet have a situation in which they are providing us a
Solaris device driver to do S/R, nor do we yet have the capacity to do
that ourselves, as I believe we still do not (after a long long effort)
have the documentation and/or code examples to
do the driver(s) ourselves. As Randy says, there has been some recent
light in this tunnel, and there is talk that the means may now be
available to us, but ...
>
> You can bemoan, but it may not help at all. I keep hearing that
the
> open source driver works, but I don't think there has been success
> yet.
>
> ---- Randy
>
>
>> -- Garrett
>>
>>
> _______________________________________________
> pm-discuss mailing list
> pm-discuss at opensolaris dot org
> http://mail.opensolaris.org/mailman/listinfo/pm-discuss
>
--
; David J. Brown Ph.D. (cantab.)
; Solaris Engineering
; Sun Microsystems Inc.
; --
; Postal Address: Telephone: (650) 786-5558
; 4150 Network Circle, UMPK17-307 FAX: (650) 786-5734
; Santa Clara, CA 95054 e-mail: djb at sun dot com
_______________________________________________
pm-discuss mailing list
pm-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/pm-discuss
|
|
|
|
Posts: 2,137
From: Germany / Bonn
Registered: 6/16/05
|
|
|
Re: [pm-discuss] debugging suspend/resume
failures...
Posted: Dec 12, 2008
9:35 PM in response to: Garrett D'Amore
To: Communities » pm » discuss
|
|
> However, I'm having problems in that my platform doesn't resume
properly
> from a suspend, even without my driver loaded.
Are you sure that suspend did work? The system's power
led / message led is flashing to indicate that S3 sleep mode
is active?
Sometimes there are problems on the way down to S3 suspend mode...
When you press the power button to resume from S3 sleep,
does the system power up? fans start to make noise?
Is there a video signal? A panic message on the screen?
> Are there any hints that we can use to help us figure out how to
debug
> failures in resume?
You can try to enable various S3STR kernel
debug printfs, I'm using something like this,
from either a serial port console, or the
ASCII VGA console:
% cat /usr/tmp/suspend
#!/bin/sh
wake_time=0
if [ x"$1" = x-t ]; then
wake_time=5
fi
pmconfig
modload -p misc/cpr
(
echo "pm_debug/W80000000; ppm_debug/W80000000;"
echo "cpr_debug/W3;"
echo "vgatext_force_suspend/W1;"
echo "acpi_rtc_wake/W$wake_time;"
) | mdb -wk
sync;sync;sync
set -x
uadmin 3 20
You can also try "uadmin 3 22", the "test suspend-to-ram".
Of cause the debug printfs are only useful when you have
a text console display at S3 resume time. I guess you don't
have this, since you're complaining about suspend/resume
support for the ATI framebuffer. |
|
|
|