Dieter Ries <[EMAIL PROTECTED]> posted [EMAIL PROTECTED], excerpted below, on Sat, 27 May 2006 10:34:16 +0200:
> i have a very severe and urgent problem: > > Then the emerge stopped with some error i dont remember, and after that, > everything got somehow slow. because i didnt want the dvd to be ruined i > waited till it was burned and the 30G were downloade. during my waiting i > tried $top or $ps -A to see the running processes, but i got > "Speicherzugriffsfehler", which is AFAIK the same as segmentation fault. > > i got it for everything i tried, and when i tried something from the KDE > menu, nothing happened. > when the data was downloaded and the dvd burned, i tried to shutdown from > KDE, no success. the i tried to shutdown from the console, no success > either. in the end i had to use the reset button. > > then, just after "freeing unused kernel memory" there are many errors, all > looking quite the same[.] init hung at that state, nothing worked. > > so i got my livecd, botted from it, then i ran fsck for all the > partitions, without any errors or anything, everything seemed fine. > > i then mounted my system and home partition and proc and typed chroot > /mnt/gentoo /bin/bash, which was followed by, guess it: segmentation > fault. > > the data on all my partitions from sda5 to 10 is still there and i can > mount them all, but i cant chroot and i cant boot. > > so no gentoo anymore[,] i am now using knoppix 4.0 to write for help. > > is there any chance to get my gentoo back to life without completely > install it again? and why does the system break when doing some things > simultaneously? > > can this be a hardware issue? OUCH! It /could/ be a hardware issue, but as you can boot from LiveCD and the fscks all come out fine, it wouldn't appear to be. I think the problem is much more likely a glibc update gone bad. Virtually /everything/ on a system links to glibc, so when it goes bad, you end up as they say "Up a creek without a paddle!" I've actually had it happen once, when a portage bug was triggered by an obscure series of events that happened to all come together in a glibc update. I was able to recover, however, as the problem in that case was a bunch of missing symlinks, and I happened to have mc open at the time and just didn't close it, but restored enough symlinks by hand based on trying to run something and getting the error and fixing that symlink and trying again, using mc to get enough of a working system to finish recovery by opening up a binpkged version (thanks to FEATURES=buildpkg, that's one of the times it saved my butt!) of glibc and restoring the symlinks with a mass copy from there. (I had to do the manual error, rebuild symlink cycle several times, until I got enough of them rebuilt to at least run bzip2 so I could untar the appropriate glibc tbz2 binpkg.) So anyway, yeah, I know the feeling! Assuming the problem is indeed glibc If you have been using FEATURES=buildpkg, recovery shouldn't be too difficult. Simply boot the LiveCD, mount the hard drive root and /usr and /var partitions if you have them, and untar the last correctly working glibc package over the hard drive root. Don't chroot to it until after the untar, so you don't kill functionality, just untar the package to the mounted hard drive root with any other partitions it might write to mounted to the correct place on top of that root. Note that you'll probably want to save copies of any of the following files in /etc that you've modified, as the untarring will overwrite them. You can restore them afterward. host.conf, init.d/nscd, nscd.conf, nsswitch.conf, rpc. If you haven't been using FEATURES=buildpkg, the process is a bit more complicated, but still nothing to panic over. You'll have to use the quickpkg feature on the CD to build a copy of the glibc package on the CD, then untar it over the mounted hard drive root as above (saving backups of the /etc files as above too). After this and recovery of the backed up /etc files, if the problem was indeed glibc, you should again have a working system. Since you bypassed portage by untarring the glibc directly, however, the version of glibc that portage thinks is installed will probably be wrong. Thus, you'll want to remerge a known working version using portage. Again, that won't be a big deal if you've been using FEATURES=buildpkg, since you can just emerge -K the version you untarred. If not, you'll need to recompile a new version, which of course will take awhile. You may wish to wait until after tonite's gaming thing, if you won't have time to recompile it before then. After you have your system back up and running, consider a couple things that might make life easier next time. Obviously, I'm going to recommend adding buildpkg to your features if you haven't got it there already. It really /can/ help. To jumpstart the binary package store then, consider using quickpkg to package up all your vital packages, gcc, glibc, portage, python, binutils, etc, at a minimum. If you want to get everything packaged right away, use emerge --pretend --emptytree to get a list, and package all those up using quickpkg. (You can automate the process if you wish using tools such as cut to get the appropriate fields out of the emerge --pretend output, then feed that to a file for further editing if desired, and then into quickpkg as the list of packages it needs to package. I did it this way when I jumpstarted my binpkg cache.) Alternatively, you can just add the buildpkg feature and emerge --emptytree world, but that will of course take awhile. Second suggestion and something I'm again doing here, consider creating a second copy of your root partition, with /var and /usr as well if you have them separate. Then, periodically, when you know you have a stable running system, erase the copy and recopy everything over from your known stable running system. The idea here is that if your system goes haywire for whatever reason, you can simply boot the backup root partition, which will have a complete working system on it as of the time you did the backup. Thus, no worries about this happening again, as you can just boot the backup system (provided you keep the snapshot fairly close to your working system so you aren't trying to use something terribly outdated). I actually do this with most of my system. The root partition has /usr and /var on it as well, so the portage database (stored in /var/db) is current with what's on that partition, and I keep a copy of that partition, which I refer to as my rootmirror. Likewise, I keep a copy of /home, a copy of my media partition, a copy of my packages (the result of FEATURES=buildpkg) partition, etc. I don't worry about a copy of /var/log (which is on a separate partition than /var), or about the portage tree (which I can simply resync if it's lost), or /tmp (since the stuff in there by definition need not survive a reboot). I make sure I keep the backup copies updated to the point where if I lose everything on the working copy, I am comfortable resuming from the backup copy, knowing that I can redo anything changed between them in a reasonable time, should it come to that. If you had been doing this, then you wouldn't be sweating it now, as you'd just have booted your backup copy and resumed from there. Thus, consider setting up your system that way once you are back up and running, so you aren't left in that sort of situation ever again. (Of course, if your hard drive dies, that's another matter. Here, I use a 4-disk RAID-6 to address that problem -- I can loose any two of the four hard drives without losing anything vital. It's software RAID, so if the board goes, I can buy another board, install the drives and CPUs, rebuild my kernel for the new board using an emergency CD, and be up and running once again. That is, however, about the only case where I'd have to use the emergency CD, as in the other cases, I should still be able to boot to the backup root snapshot and recover from there.) Good luck! I hope it /is/ just glibc, as that's scary to recover from when the problem occurs, but not the end of the world. If it's not glibc, things get rather more complex, but all evidence so far says that's what it is. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- [email protected] mailing list
