Dieter Ries <[EMAIL PROTECTED]> posted [EMAIL PROTECTED], excerpted
below, on  Sat, 27 May 2006 10:34:16 +0200:

> i have a very severe and urgent problem:
> 
> Then the emerge stopped with some error i dont remember, and after that,
> everything got somehow slow. because i didnt want the dvd to be ruined i
> waited till it was burned and the 30G were downloade. during my waiting i
> tried $top or $ps -A to see the running processes, but i got
> "Speicherzugriffsfehler", which is AFAIK the same as segmentation fault.
> 
> i got it for everything i tried, and when i tried something from the KDE
> menu, nothing happened.
 
> when the data was downloaded and the dvd burned, i tried to shutdown from
> KDE, no success. the i tried to shutdown from the console, no success
> either. in the end i had to use the reset button.
> 
> then, just after "freeing unused kernel memory" there are many errors, all
> looking quite the same[.] init hung at that state, nothing worked.
> 
> so i got my livecd, botted from it, then i ran fsck for all the
> partitions, without any errors or anything, everything seemed fine.
> 
> i then mounted my system and home partition and proc and typed chroot
> /mnt/gentoo /bin/bash, which was followed by, guess it: segmentation
> fault.
> 
> the data on all my partitions from sda5 to 10 is still there and i can
> mount them all, but i cant chroot and i cant boot.
> 
> so no gentoo anymore[,] i am now using knoppix 4.0 to write for help.
> 
> is there any chance to get my gentoo back to life without completely
> install it again? and why does the system break when doing some things
> simultaneously?
> 
> can this be a hardware issue?

OUCH!

It /could/ be a hardware issue, but as you can boot from LiveCD and the
fscks all come out fine, it wouldn't appear to be.

I think the problem is much more likely a glibc update gone bad. 
Virtually /everything/ on a system links to glibc, so when it goes bad,
you end up as they say "Up a creek without a paddle!"

I've actually had it happen once, when a portage bug was triggered by an
obscure series of events that happened to all come together in a glibc
update.  I was able to recover, however, as the problem in that case was a
bunch of missing symlinks, and I happened to have mc open at the time and
just didn't close it, but restored enough symlinks by hand based on
trying to run something and getting the error and fixing that symlink
and trying again, using mc to get enough of a working system to finish
recovery by opening up a binpkged version (thanks to FEATURES=buildpkg,
that's one of the times it saved my butt!) of glibc and restoring the
symlinks with a mass copy from there.  (I had to do the manual error,
rebuild symlink cycle several times, until I got enough of them rebuilt to
at least run bzip2 so I could untar the appropriate glibc tbz2 binpkg.)

So anyway, yeah, I know the feeling!

Assuming the problem is indeed glibc

If you have been using FEATURES=buildpkg, recovery shouldn't be too
difficult.  Simply boot the LiveCD, mount the hard drive root and /usr and
/var partitions if you have them, and untar the last correctly working
glibc package over the hard drive root.  Don't chroot to it until after
the untar, so you don't kill functionality, just untar the package to the
mounted hard drive root with any other partitions it might write to
mounted to the correct place on top of that root.

Note that you'll probably want to save copies of any of the following
files in /etc that you've modified, as the untarring will overwrite them. 
You can restore them afterward.  host.conf, init.d/nscd, nscd.conf,
nsswitch.conf, rpc.

If you haven't been using FEATURES=buildpkg, the process is a bit more
complicated, but still nothing to panic over.  You'll have to use the
quickpkg feature on the CD to build a copy of the glibc package on the CD,
then untar it over the mounted hard drive root as above (saving backups of
the /etc files as above too).

After this and recovery of the backed up /etc files, if the problem was
indeed glibc, you should again have a working system.  Since you bypassed
portage by untarring the glibc directly, however, the version of glibc
that portage thinks is installed will probably be wrong.  Thus, you'll
want to remerge a known working version using portage.  Again, that won't
be a big deal if you've been using FEATURES=buildpkg, since you can just
emerge -K the version you untarred.  If not, you'll need to recompile a
new version, which of course will take awhile.  You may wish to wait until
after tonite's gaming thing, if you won't have time to recompile it before
then.

After you have your system back up and running, consider a couple things
that might make life easier next time.

Obviously, I'm going to recommend adding buildpkg to your features if you
haven't got it there already.  It really /can/ help.  To jumpstart the
binary package store then, consider using quickpkg to package up all your
vital packages, gcc, glibc, portage, python, binutils, etc, at a minimum. 
If you want to get everything packaged right away, use emerge --pretend
--emptytree to get a list, and package all those up using quickpkg.  (You
can automate the process if you wish using tools such as cut to get the
appropriate fields out of the emerge --pretend output, then feed that
to a file for further editing if desired, and then into quickpkg as the 
list of packages it needs to package.  I did it this way when I
jumpstarted my binpkg cache.)  Alternatively, you can just add the
buildpkg feature and emerge --emptytree world, but that will of course
take awhile.

Second suggestion and something I'm again doing here, consider creating a
second copy of your root partition, with /var and /usr as well if you have
them separate.  Then, periodically, when you know you have a stable
running system, erase the copy and recopy everything over from your known
stable running system.  The idea here is that if your system goes haywire
for whatever reason, you can simply boot the backup root partition, which
will have a complete working system on it as of the time you did the
backup.  Thus, no worries about this happening again, as you can just boot
the backup system (provided you keep the snapshot fairly close to your
working system so you aren't trying to use something terribly outdated).

I actually do this with most of my system.  The root partition has /usr
and /var on it as well, so the portage database (stored in /var/db) is
current with what's on that partition, and I keep a copy of that
partition, which I refer to as my rootmirror.  Likewise, I keep a copy of
/home, a copy of my media partition, a copy of my packages (the result of
FEATURES=buildpkg) partition, etc.  I don't worry about a copy of /var/log
(which is on a separate partition than /var), or about the portage tree
(which I can simply resync if it's lost), or /tmp (since the stuff in
there by definition need not survive a reboot).  I make sure I keep the
backup copies updated to the point where if I lose everything on the
working copy, I am comfortable resuming from the backup copy, knowing that
I can redo anything changed between them in a reasonable time, should it
come to that.

If you had been doing this, then you wouldn't be sweating it now, as you'd
just have booted your backup copy and resumed from there.  Thus, consider
setting up your system that way once you are back up and running, so you
aren't left in that sort of situation ever again.  (Of course, if your
hard drive dies, that's another matter.  Here, I use a 4-disk RAID-6 to
address that problem -- I can loose any two of the four hard drives
without losing anything vital.  It's software RAID, so if the board goes,
I can buy another board, install the drives and CPUs, rebuild my kernel
for the new board using an emergency CD, and be up and running once again.
That is, however, about the only case where I'd have to use the emergency
CD, as in the other cases, I should still be able to boot to the backup
root snapshot and recover from there.)

Good luck!  I hope it /is/ just glibc, as that's scary to recover from
when the problem occurs, but not the end of the world.  If it's not glibc,
things get rather more complex, but all evidence so far says that's what
it is.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

-- 
[email protected] mailing list

Reply via email to