On Tue, Nov 26, 2013 at 07:44:32AM -0500, Donald Allen wrote:
> On Tue, Nov 26, 2013 at 2:15 AM, Mike Larkin <mlar...@azathoth.net> wrote:
> > On Sun, Nov 24, 2013 at 10:42:45AM -0500, Don Allen wrote:

..snip..

> 
> After waking up:
> 
> Switching consoles with ctrl-alt F2, I was able to run the date
> command repeatedly, and the time is advancing. 'ls' also worked
> normally, but 'ls -l' hung. 'ps aux' hangs.  'shutdown' and 'reboot'
> both hang.
> 
> Switching consoles with ctrl-alt F1, I noticed the following chatter:
> ahci0: device on port 0 didn't come ready TFD: 0x80<BSY>
> ahci0: Stopping the port, soft reset slot 31 was still active
> ahci0: unable to communicate with device on port 1

That's your problem, your disk didn't come back after resume. I'm not sure
why, this is the first time I've seen that. Maybe some ahci expert
can comment further. I've frequently seen the first ahci0: line above
but my disks always come back online after that.

> 
> I don't know if the above is significant, but it isn't there on that
> first console if I don't suspend and it struck me as suspicious.
> 
> I also noticed that the disk-busy light on the front panel is on solid
> after attempting to resume. In normal operation, when there is disk
> activity and the light is on, I can hear the disk, presumably the
> heads seeking. In this situation, I don't hear that. I realize that
> doesn't mean there isn't disk activity, just not long enough head
> excursions to be audible.

The disk came back partly-resumed. Who knows what state it's in.

> 
> I have all filesystems mounted with softdep enabled, and after
> power-cycling to reboot, there's usually a lot of chatter from fsck
> about repairing things on various filesystems. One that usually turns
> up needing repair is sd0d, which is /tmp. If the fsck output is logged
> somewhere and it would be helpful, I can send it. I tried to find it
> with
> 
> cd /var
> find . -exec fgrep SALVAGED {} \; -print
> 
> which turned up nothing. Or I can try to photograph the screen as it's
> happening.

Your FSes were uncleanly shut down since the disk didn't resume and that's
why fsck finds a bunch of uncleanliness.

> 
> I also tried suspending with 'zzz' right after booting and logging in,
> no 'startx'. After attempting to resume, I got a stream of messages on
> the first console, all the same:
> 
> ehci_idone: ex=0xffff8000001f3c00 is done!

That's irrelevant and may even be fixed by some recent commits. It's
because we basically need to tear down the USB device tree and reconnect
it on resume. There was probably an xfer in flight when you suspended and
the device to which it was associated dissappeared (temporarily) on
resume.

> 
> The disk-busy light was not on. I could not switch consoles to try
> commands and could not type at the console that was spewing these
> messages. As with the above, I had to power-cycle to recover.
> 
> /Don

Your problem is that your disk didn't resume. There are some efforts going
on presently to improve some of the wakeup/resume codepaths, but those
diffs aren't in the tree yet. They may or may not help.

Reply via email to