Anders =?iso-8859-1?Q?Th=F8gersen?= <[EMAIL PROTECTED]> posted [EMAIL PROTECTED], excerpted below, on Sun, 21 May 2006 00:30:06 +0200:
> On 04:52 Fri 12 May 2006, Duncan wrote: >> Anders posted as summarized on 12 May 2006: >> >> > [Repeatable segfault doing emerge sync at 51%. Portage-2.0.54] >> >> [That's almost certainly a portage cache corruption issue. Try emerge >> --metadata. That should just update the cache without doing the sync >> part first. If that fails, delete the cache and run emerge --metadata >> again, to rebuild it.] > > Sorry for the late reply,... Don't worry too much about the timeliness as the problem's yours, not mine, so your schedule. From the other side, that's one reason I prefer newsgroups or mailing lists to private help -- if one person doesn't get in a timely reply, someone else likely will. (The other big reason is that no single person always guesses the problem right or has the experience to fix it, and a list/newsgroup allows more folks a chance to look at it than private mail would.) > I backed up /var/cache/edb as you suggested and began emerge --metadata, > ... First segfault occurred at 31%. Feeling bold i restarted the > command and this time it went all the way to the magic 51% where it > segfaulted as before. From here every emerge --metadata results in a > segfault at 51% :-/ > > If I understand you correctly the problem of this segfault is due to a > specific file in the poretage tree. To correct this problem must I then > locate this file? Well, locating it would help, but it may be that it isn't necessary, as there are other ways to tackle the problem. A couple things to keep in mind: (1) Portage /can/ operate without that cache -- it's just /very/ slow. Thus, if it comes to being a problem with the portage you are running, you should still be able to merge a different version. (2) We now know the problem regenerates from a clear cache. At this point, with the problem regenerating from a clear cache, the next thing I'd want to establish is that it's not a file system problem. Delete the cache again. If you have /var or /var/cache on its own mount, umount it (depending on whether you have /var/log on the same mount, and on the services you are running, you may have to switch to single user mode or at least shut down your syslog and perhaps other services in order to umount /var) and do a full fsck on it. Remount and startup your services again or simply reboot, and try the emerge --metadata again. If the problem isn't yet gone, delete the cache again and continue... The next item on the checklist is the file system containing the portage tree itself. The tree can be redownloaded, so in general, it's safe to delete. If you run FEATURES=buildpkg, as I've often recommended on this list (different topic but something to look at once you get up and running again, if you haven't already), and your $PKGDIR is in the portage tree as it is by default (/usr/portage/packages, IIRC), you'll want to copy or move that elsewhere. Depending on your internet speed and whether you are charged per byte downloaded, you may wish to do the same thing with $DISTDIR (/usr/portage/distfiles by default), which contains all the source tarballs portage had downloaded. Then delete the portage tree, and if it's on a non-root filesystem, unmount and fsck it as well. See below for refetching, as there's an easier way than emerge --sync when you are fetching the entire thing. If either or both of the above are on your root filesystem, after the deletes, reboot or boot to your rescue solution (the liveCD or alternate boot volume or whatever) and do the fsck from there. The deletes aren't absolutely necessary, but are worthwhile since the data is redownloadable/rebuildable anyway, and if the problem /is/ a filesystem error, it's easier just renewing the data than it is trying to rebuild the file from incomplete data in lost&found. Additionally, if there happen to be other errors on the filesystem and thus other files end up in lost&found, it's easier to find the files you really /do/ need to recover there if there's less noise from files that would be easier simply refetched or recached. Now that you know it's not a problem with a bad filesystem, the next step is getting a new copy of the portage tree. Since we deleted the tree we had, emerge --sync isn't the most efficient option, tho it would normally do the job. Rather, and this kills two birds with one stone as it's the next thing to try as well, use emerge-webrsync. This fetches a verified snapshot tarball of the tree taken daily, so it's not quite as uptodate as a live sync would be (it could be up to 24 hours old), but it's more efficient if you aren't starting with a mostly uptodate tree with only a few changes needed, than emerge --sync would be. Doing it this way, we test another sync method and ensure that we get a complete copy of the tree, as well, bypassing the rsync and any possibly broken files that had been causing problems in your local copy of the tree. emerge-webrsync performs an emerge --metadata after completing the tree sync, so if it goes fine, you should be back in business. Try another emerge --sync and see. If you are still having problems at /that/ point, having verified that it's not a filesystem issue, and trying a completely new copy of the tree fetched with emerge-webrsync, /then/ things start getting interesting. There are still some things that can be tried, but better to wait until we know they are needed before getting worried. The output of emerge-webrsync or the next sync where the problem reoccurs would be interesting as well, so post it. Also, at this point, it may be useful to file a portage bug and get the opinion of the real experts. However, hopefully, that's not necessary, as a clean filesystem and copy of the tree will have eliminated the issue. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- [email protected] mailing list
