El Thu, 10-06-2010 a las 16:53 -0300, Daniel Drake escribió: > > 1 tty1 Ss+ 0:02 /sbin/init > > 945 ? Ss 0:00 /bin/sh -e -c ?runlevel --set S >/dev/null || > > true???/ > > 950 ? S 0:00 \_ /bin/bash /etc/rc.d/rc.sysinit > > 1597 ? D 0:00 \_ modprobe scsi_wait_scan > > I strongly doubt this is the issue. This is a very simple module. > > Note your other blocked process: > > > 1035 ? D< 0:00 /sbin/modprobe -b > > pci:v000011ABd00004102sv000011ABsd00 > > This one also has a lower process ID, suggesting that it was run first. > > I suspect there is a crash/hang within this module, and at this point, > attempting to load any other module (scsi_wait_scan or otherwise) will > hang. Due to contention on a lock, corruption, a dead kernel thread, > or something like that.
Ok, makes sense. If one module hangs during init, any subsequent invocation of modprobe would also hang. > My suggested next steps in diagnosis: > 1. Identify which device is pci:v000011ABd00004102 > Anyone can do this on any XO-1 with: lspci -vd 11ab:4102 > I'm pretty sure its a part of the CAFE chip but I don't have an XO to check. It's the camera controller. Hence, the other module being loaded must be cafe_ccic. Looking at the initialization of cafe_ccic, there seems to be a complicated dance of mutexes and spin locks, plus a kernel thread and a bunch of sleeps. All the ingredients for a good deadlock are present :-) Jonathan, can you make your best guess? > 2. Look at dmesg at point of crash > Considering that you got a process tree I guess you can also run some > other commands at point of hang? > Run "dmesg" and capture output. I did, but there was nothing interesting in dmesg, which is what I would expect from a pure locking bug. Moreover, CONFIG_DEBUG_MUTEXES is turned off. Perhaps interestingly, on regular boots, I can see some psmouse initialization messages intermixed with the cafe_ccic ones. > 3. Capture kernel task dump at point of crash > echo t > /proc/sysrq-trigger > The task dump will appear in kernel logs (dmesg). Ok, I'll do it as soon as I see it again. BTW: this bug seems to be easier to trigger by forcing a shutdown while some data is being written to disk. -- // Bernie Innocenti - http://codewiz.org/ \X/ Sugar Labs - http://sugarlabs.org/ _______________________________________________ Devel mailing list [email protected] http://lists.laptop.org/listinfo/devel
