Just a final word on this... The problem is effectively resolved... i was able to rebuild the system, then world with zero issues. I then ran revdep-rebuild, no issues and no broken links found, I then recompiled pkgs with deps against glibc and ran revdep-rebuild again. The whole thing ran at full capacity and with zero errors.
I don't know if I felt as good as this when I found the "root cause"... I just know that having "root" again feels great! ;) Okay... and now let's upgrade the kernel... ;P Thanks again, Simon On Sat, Jan 8, 2011 at 3:16 PM, Mark Knecht <[email protected]> wrote: > Glad you have a root cause/solution. > > On Sat, Jan 8, 2011 at 10:49 AM, Simon <[email protected]> wrote: > <SNIP> > > The virtual HD is physically on a raid (unknown config). Mark, the > sector > > size issue you mention, does it have to do with aligning real HD sectors > > with filesystem sectors (so that stuff like read-ahead will get > > no-more-no-less than what the kernel wants)? I've read about this kind > of > > setup when I was interested in RAID long ago... Now that I know my hd is > > actually on a raid, maybe i could benefit some I/O performance > improvements > > by tuning this a bit! > > > > As it's RAID underneath it's likely set up correctly. The issue I had > in mind was the disk being a 4K/sector disk but the person who built > the partition not knowing to align the partition to a 4K boundary. > That can cause a _huge_ slowdown. > > I doubt that's the case here. As this is a hosting service they likely > know what they are doing in that area, and if it wasn't done correctly > you would have noticed it before I think. > > > Anyway, I was told by the support team that another user on the same > > physical machine (remember it's a xen VPS) was doing I/O intensive stuff > > which could have "I/O starved" my system. I don't understand how > starving > > or even doing some kind of DoS attack could lead to a complete freeze on > the > > console, but eh... > > Makes sense actually. The other guy took all the disk I/O leaving you > with none. If you can't get to the disk then you cannot read ebuilds > or write compiled code, or at least not fast. > > > They offered to migrate my system to another physical > > machine, and after that... I was able to perform a complete 'emerge -e > > system' in one shot without a scratch, I even did it with --jobs=2 and > > MAKEOPTS="-j4". After that, I started a complete "emerge --keep-going > > --jobs=2 world" with MAKEOPTS="-j8"... (i got 4 cores: dual xeon 2Ghz) > > > > So now you're in good shape...until some user on the new system starts > hogging all the disk I/O and holds you up again. > > > This last emerge is still going on as I write this and is emerging pkg > 522 > > of 620 !! And there were no build errors so far... > > > > It's emerging glibc at the moment, so once the big emerge is finished, > I'll > > probably recompile all pkgs that depend on glibc. I believe glibc was > > actually updated during my very initial update on monday and I haven't > come > > to do that... but I guess everything will go smoothly from here. > > > > Thanks again for all your help guys! > > Simon > > Good that you got to the root of the problem. > > Good luck, > Mark > >

