Alex Efros posted on Sun, 21 Oct 2012 16:24:32 +0300 as excerpted: > Hi! > > On Sun, Oct 21, 2012 at 08:02:47AM +0000, Duncan wrote: >> Bottom line, an empty @system set really does make a noticeable >> difference in parallel merge handling, speeding up especially >> --emptytree @world rebuilds but also any general update that has a >> significant number of otherwise @system packages and deps, >> dramatically. I'm happy. =:^) > > I think "@system first" and "@system not merge in parallel" rules are > safe to break when you just doing "--emptytree @world" on already > updated OS because it's only rebuild existing packages, and all packages > while compiling will see same set of other packages (including same > versions). But when upgrading multiple packages (including some from > original @system and some from @world) this probably may result in bugs.
In theory, you're right. In practice, I've not seen it yet, tho being cautious I'd say it needs at least six months of testing (I've only been testing it about a month, maybe six weeks) before I can say for sure. It /was/ something I was a bit concerned about, however. That was in fact one of the reasons I decided to try it on the netbook's chroot as well, which hadn't been upgraded in a year and a half. I figured if it could work reasonably well there, the chances of an undiscovered real problem were much lower. However, it /is/ worth noting that as a matter of course, I already often choose to do some system-critical upgrades (portage, gcc, glibc, openrc, udev) on their own, before doing the general upgrades, in part so I can deal with their config file changes and note any problems right away, with a relatively small changeset to deal with, as opposed to having a whole slew of updates including critical system package updates happen all at once, thus making it far more difficult to trace which update actually broke things. That's where the years of gentoo experience I originally mentioned comes in. This isn't going to be as easy for a gentoo newbie for at least two reasons. First, they're less likely to know what packages really /are/ system critical, and thus are more likely to unmerge them without the extra unmerge warning a package in the system set gets. (I mentioned that one in the first post.) Second, spotting critical updates in the initial --pretend run, knowing which packages it's a good idea to upgrade first, by themselves, dealing with config file updates, etc, for just that critical package (and any dependency updates it might pull in), before going on to the general @world upgrade, probably makes a good bit of difference in practice, and gentoo newbies are rather less likely to be able to make that differentiation. (I didn't specifically mention that one until now.) > As for "--emptytree @world" speedup, can you provide benchmarked values? > I mean, only few packages forced to use only one CPU Core while > compiling. > So, merging packages in parallel may save some time mostly for doing > unpack/prepare/configure/install/merge. All of them except configure > actually do a lot of I/O, which most likely lose a lot in speed instead > of gain when done in parallel (especially keeping in mind kernel bug > 12309). So, at a glance time you may win on configure you'll mostly lose > on I/O, and most of time all your CPU Cores will be loaded anyway while > compiling, and doing configure in parallel to compiling unlikely save > some time. This is why I think without actual benchmarking we can't be > sure how faster it became (if it became faster at all, which is > questionable). Good points, and no, I can't easily provide benchmarks, both because of the recent hardware upgrade here, and because portage itself has been gradually improving its parallel merging abilities -- a recent update changed the scheduling algorithm so it starts additional merges much sooner than it did previously. (See gentoo bug 438650 fixed in portage 2.1.11.29 and 2.2.0_alpha140, both released on Oct 17. That I know about that hints at another thing I do routinely as an experienced gentooer: I always read portage's changelog and check out any referenced bugs that look interesting, before I upgrade portage. To the extent practical without actually reading the individual git commits, I want to know about package manager changes that might affect me BEFORE I do that upgrade!) But, I believe as core-counts rise, you're underestimating the effects of portage's parallel merging abilities. In particular, a lot of packages normally in @system (or deps thereof) are relatively small packages such as grep, patch, sed... where the single-threaded configure step takes a MUCH larger share of the total package merge time than it does with larger packages. Similarly, the unpack and prepare phases, plus the package phase for folks using FEATURES=binpkg, tend to be single-threaded.[1] Thus, instead of serializing several dozen small mostly single-threaded package merges for packages like grep/sed/patch/util-linux/etc, depending on the --jobs and --load-average numbers you feed to portage, several of these end up getting done in parallel, with the portage multi-job output bumping a line every few seconds because it's doing them in parallel, instead of every minute or so, because it's doing one at a time. Meanwhile, it should be obvious, but it's worth stating anyway. The effect gets *MUCH* bigger as the number of cores increases. For a dual- core, bah, not worth the trouble, as it could cause more problems then it solves, especially if people are trying to work on other things while portage is doing its thing in the background. I suspect the break-over point is either triple-core or quad-core. One of the reasons portage is getting better lately is because someone's taken an interest that has a 32-core, with a corresponding amount of memory (64 or 128 gig IIRC). It's worth noting, as I mentioned, that I now have a 6-core, recently upgraded from a dual-dual-core (4 cores), with a corresponding memory upgrade, to 16 gigs. One of the first things I noticed doing emerges was how much more difficult it was to keep the 6-core actually peaked out to 100% CPU, than it had been the 4-core. While I suspect there would have been a difference on the quad-core (as I said I believe the break-over's probably 3-4 cores), it wasn't a big deal there. Staring at that 6-core running at 100% on 1-2 cores CPU-freq-maxed at 3.6 GHz, while the other 4-5 cores remained near idle at <20% utilization at CPU-freq-minimum 1.4 GHz... was VERY frustrating. So began my drive to empty @system and get portage properly scheduling parallel merges for former @system packages and their deps as well! For the quad-core plus hyperthreading (thus 8 threads I take it?) you mention below (4.6 GHz OC, nice! I see stock is 3.4 GHz), the boost from killing @system forced serialization should definitely make a difference (unless the hyperthreading doesn't do much for that work load, making it effectively no better than a non-hyperthreaded quad-core. For my 6-core, it made a rather big difference, and I guarantee if you had the 32-core that one of the devs working on improving portage's parallelization has, you'd be hot on the trail to improve it as well! > As for me, I found very effective way to speedup emerge is upgrading > from Core2Duo E6600 to i7-2600K overclocked to 4.6GHz. This speedup > compilation on my system in 6 times (kernel now compiles in just 1 > minute). And to speedup most other (non-compilation) portage operations > I use 4GB tmpfs mount on /var/tmp/portage/. I remember reading about the 1-minute kernel compiles on i7s. Very impressive. FWIW, there's a lot of variables to fill in the blank on, before we can be sure kernel build time comparisons are apples to apples (I had several more paragraphs written on that, but decided it was a digression too far for this post so deleted 'em), but AFAIK when I read about it (on phoronix I believe), he was doing an all-yes config, so building rather more than a typical customized-config gentooer, but was using a rather fast SSD, which probably improved his times quite a bit compared to "spinning rust". But I don't know if his timings included the actual compress (and if so with what CONFIG_KERNEL_XXX compression option) and I don't believe they included the actual install, only the build. That said, a 1-minute all-yes-config kernel build time is impressive indeed, the envy of many, including me. (OTOH, my fx6100 was on sale for $100, $109 post-tax. That's lower than pricewatch's $118 lowest quote (shipped, no tax), and only about 40% of the $273 low quote for an i7-2600k.) My build, compress (CONFIG_KERNEL_XZ) and install, runs ~2 minutes (1:58-2:07, 10+ runs, warm-cache), so yes, even if your build time doesn't include compress and install, which it might, 1-minute is still VERY impressive. Tho as I said, my CPU cost ~40% of the going price on yours, so... Meanwhile... I too use and DEFINITELY recommend a tmpfs $PORTAGE_TMPDIR. I'm running 16 gig RAM here, and didn't want to run out of room with parallel builds, so set a nice roomy 12G tmpfs size. A $PORTAGE_TMPDIR on tmpfs also reduces the I/O. At least here, the only time I've had problems, both on the old hardware and on the new, is when I go into swap. (And on the old hardware I had swap priority= striped across four disks and 4-way md/raid0, so the kernel could schedule swap- out vs read-in much better and I didn't see a problem until I hit nearly half-gig of swap loading at once; the new hardware is only single-disk ATM, and I see issues starting @ 80 meg or so of swap loading, at once.) But with 16 gig RAM on the new system, the only time I see it go into swap is when I run a kernel build with uncapped -j, thus hitting 500+ jobs and close enough to 16 gigs that whether I hit swap or not depends on what else I've been doing with the system. Basically, I/O is thus not a problem at all with portage, here, up to the --jobs=12 --load-average=12 along with MAKEOPTS="-j20 -l15" I normally run, anyway. On the old system with only six gigs of RAM, if I tried hard enough I could get portage to hit swap there, but I limited --jobs and MAKEOPTS until that wasn't an issue, and had no additional problems. Tho I should mention I also run PORTAGE_NICENESS=19 (and my kernel-build/ install script similarly renices itself to 19 before starting the kernel build), which puts it in batch-scheduling mode (idle-only scheduling, but longer timeslices). If it matters, filesystem is reiserfs, iosched is cfq, drive is sata2/ahci (amd 990fx/sb950 chipset) 2.5" seagate "spinning rust". But I definitely agree with $PORTAGE_TMPDIR on tmpfs. It makes a HUGE difference! --- [1] Compression parallelism: There are parallel-threaded alternatives to bzip2, for instance, but they have certain down-sides like decompress only being parallel where the tarball was compressed with the same parallel tool, and certain compression buffer nul-fill handling differences that make them not functionally perfect drop-in replacements. See the recent discussion on the topic on the gentoo-dev list for instance. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman
