On Wed, Dec 16, 2009 at 10:07 PM, Tom Bennet <twben...@gmail.com> wrote:
> I spent the day recovering from a Gentoo upgrade, and thought I'd document
> the experience in case it helps someone else.
>
> I'm running a custom kernel 2.6.25-gentoo-r7 on amd64, though I don't think
> the rarer hardware is relevant.
>
> I tend to put off upgrading my Gentoo box because anytime I do, something
> breaks.  I'm afraid I haven't changed my opinion about that.  Anyway, I did
> "emerge --update --deep world" and plugged my ears. Some 600-odd packages
> (and a few simpler problems) later, the system seemed to be doing okay.  So
> I thought I'd see if it could survive a reboot.  No, it couldn't.
>
> On boot it failed checking the root file system and dropped into the repair
> shell.  The reason the fsck failed is that the root pseudo device file
> /dev/md0, didn't exist.  The root file system was actually, fine, though.
> Inside the repair shell, I could see all the files from my root, but there
> wasn't much in /dev.  (I have the md stuff compiled in to the kernel, and
> don't use an initrd, so it wasn't an initrd problem.)
>
> Short Solution
>
> The problem was with udev, the facility which automatically populates the
> /dev directory.  During the upgrade, emerge noted that my kernel version was
> a bit early, but acceptable.  What was missing, apparently, was the signalfd
> syscall, which that kernel version either doesn't have or I hadn't
> configured.  Apparently, udev has only started using signalfd recently, so
> the solution was to downgrade to an older version of udev (udev-141 to be
> precise).
>
> What I Actually Did To Get There
>
> Of course, I didn't know that at first.  Just had a fun unbootable system.
> I might have been able to simply emerge the downgrade from the repair shell
> (the network did come up), but I didn't know to try that yet.  I figured I
> wanted to find some way to make the system boot.  Since the failing file
> check is done from /etc/init.d/checkroot, I added a mknod command to create
> the device node before trying to run the file check.  At the start of the
> start() method:
>
>         if [ ! -e /dev/md0 ] ; then
>            mknod -m 0660 /dev/md0 b 9 0
>         fi
>
> It's a hack, not a solution, but it did make the system boot, to a rather
> crippled state.  Since there were a lot of devices missing, a lot of
> services wouldn't start.  (If you're using a more boring root partition, it
> might be something like "mknod -m 0660 /dev/sda1 b 8 1")
>
> So I had managed by now to gather that udev wasn't working, but I didn't
> know why.  My first thought was to try "/etc/init.d/udev start", to see if
> it would start.  But it told me that the script is written for baselevel-2,
> and I shouldn't use it on baselevel-1.  Following a bit of googling about
> what the heck a baselevel is, I gathered that I was using baselevel-1, and
> so the service wasn't supposed to be started that way.   So it wasn't a bug
> that it wouldn't start that way.  Another page suggested trying to run it
> directly, with "/sbin/udevd --daemon", which gave the message "error getting
> signalfd".  That told my why it didn't start. This message was also in the
> logs, but for some reason I didn't look there until later.
>
> So back to Google, and I found a message on a Debian board noting that udev
> had started using signalfd recently.  This suggested an old version might do
> the trick.  I tried one, and it did.

I really only have two things to say, after reading this... First, and
this really does overshadow the second in weight, thank you for the
excellently presented writeup of problem *and* solution, as more often
than ever should be (less so here, but across the net as a whole),
problems are mentioned, solutions are offered, and rarely does a good,
clear, "this worked" follow. Secondly... it's been my experience, with
Gentoo, that things break far more often when I allow longer delays
between updating than when I keep up to date with everything, and it's
held true for me on both x86 and ~x86 systems (as has the headache
when I've put updates off).

And.. I reiterate a part of the "first"... Thank you for the writeup.

-- 
Poison [BLX]
Joshua M. Murphy

Reply via email to