May I put my oar into your optimisation dicussion. It's funny, Duncan. On the one side you are saving every byte of cpu-cache. On the other side, you are happy by having forked bashes in your main memory. But how do you take control about that? I mean, how do you get the code of your forked bashes away from your cpu cache to have it free for kernel code?
A long time ago . . ., I was testing some CFLAGS on my own programs. I wrote a fast-fourier algorithm myself, only to see the "impressive" difference between Os, O3 and some other optimisation flags. I fed my fast-fourier algorithm with a large amount of input. But no matter how hard I tried to get it faster by changing the flags, it didn't work. The difference is marginal and not every flag brings improvement for every program. The only thing that changed a lot was the time gcc needs to perform those optimisations. Bernhard Am Donnerstag 09 Februar 2006 01:17 schrieb Duncan: > Simon Stelling posted <[EMAIL PROTECTED]>, excerpted below, on > > Wed, 08 Feb 2006 21:37:33 +0100: > > Duncan wrote: > >> I should really create a page listing all the little Gentoo admin > >> scripts I've come up with and how I use them. I'm sure a few folks > >> anyway would likely find them useful. > >> > >> The idea behind most of them is to create shortcuts to having to type in > >> long emerge lines, with all sorts of arbitrary command line parameters. > >> The majority of these fall into two categories, ea* and ep*, short for > >> emerge --ask <additional parameters> and emerge --pretend ... . Thus, I > >> have epworld and eaworld, the pretend and ask versions of emerge -NuDv > >> world, epsys and easys, the same for system, eplog <package>, emerge > >> --pretend --log --verbose (package name to be added to the command line > >> so eplog gcc, for instance, to see the changes between my current and > >> the new version of gcc), eptree <package>, to use the tree output, etc. > > > > Interesting. But why do you use scripts and not simple aliases? Every > > time you launch your script the HD performs a seek (which is very > > expensive in time), copies the script into memory and then forks a whole > > bash process to execute a one-liner. Using alias, which is a bash > > built-in, wouldn't fork a process and therefore be much faster. > > My thinking, which is possibly incorrect (your input appreciated), is that > file-based scripts get pulled into cache the first time they are executed, > and will remain there (with a gig of memory) pretty much until I'm done > doing my upgrades. At the same time, they are simply in cache, not > something in bash's memory, so if the memory is needed, it will be > reclaimed. As well, after I'm done and on to other tasks, the cached > commands will eventually be replaced by other data, if need be. > > Aliases (and bash-functions) are held in memory. That's not as flexible > as cache in terms of being knocked out of memory if the memory is needed > by other things. Sure, that memory may be flushed to disk-based swap, but > that's disk based the same as the actual script files I'm using, so > reading it back into main memory if it's faulted out will take something > comparable to the time it'd take to read in the script file again anyway. > That's little gain, with the additional overhead and therefore loss of > having to manage the temp-copy in swapped memory, if it comes to that. > > Actually, there are some details here that may affect things. I don't > know enough about the following factors to be able to evaluate how they > balance out, but the real reason I chose individual scripts is below. > > One, here anyway, tho not on most systems, I'm running four SATA disks in > RAID. The swap is actually not on the RAID, as the kernel manages it like > RAID on its own, provided all four swap areas are set to the same priority > (they are), which means swap is running on the equivalent of > four-way-striped RAID-0. Meanwhile, the scripts, as part of my main > system, are on RAID-6 for redundancy, so with the same four disks backing > the RAID-6 as the swap, I've only effectively two-way-striped storage > there, the other two disk stripes being parity. Thus, retrieval from the > 4-way-striped swap should in theory be more efficient than from the > 2-way-striped regular storage. OTOH, the granularity of the stripe > in either case, against the size of the one or two-line script, likely > means that it'll be pulled from a single stripe (at the speed of > reading from a single disk, tho there are parallelizing opportunities > not available on a single disk). It's also likely that the swap will be > more optimally managed for fast retrieval than the location on the regular > filesystem is. Balanced against that we have the overhead of maintaining > the swap tracking. > > That's assuming it would swap that out to the dedicated swap in the first > place. I'm not familiar with Linux's VM, but given that the aliases and > functions would be file-based in either case, it's possible it would > simply drop the data from main memory, relying on the fact that that the > data is clean file-backed data and could be read-in directly from the > files again, if necessary, rather than bothering with actually creating a > temporary copy of the /same/ data in swap, taking time to do so when it > could just read it back in from the file. > > Another aspect is the effect of data vs metadata caching. Again, I'm not > familiar with how Linux manages this, and indeed, it may differ between > filesystems, but the idea is that if the file metadata is still cached, > even if the file itself isn't, it's a single disk seek and read to read > the data back in, as opposed to multiple seeks and reads, following the > logical directory structure to fetch each directory table in the > hierarchy until it reaches the entry that actually has the file location, > before it can read the file itself, to read the file initially, or if the > location metadata has been flushed as well. (Back several years ago on > MSWormOS, one of the first things I always did after a reinstall was set > the system to server profile, which kept a far larger metadata cache, on > the theory that the metadata was usually smaller than the data, and for > dirs, sharable among many data files, so I'd rather spend cache memory on > metadata than data. The other choices were the default desktop profile, > and laptop, a much smaller metadata cache. I originally learned about > these as a result of reading about a bug in the original 95 as shipped, > that swapped some entries in the registry, and therefore cached FAR less > metadata than it should have. I don't know where these tweaks are located > on Linux, or how to go about adjusting them safely.) > > Basically, therefore, I don't believe aliases to be a big positive, and > possibly somewhat of a negative, as opposed to scripts, because the > scripts will be cached in most cases after initial use anyway, yet they > have the advantage of not having to be maintained or tracked in memory > when I'm doing other tasks and the system needs that cache. > > Given that I don't believe it's a big positive, I prefer the > administrative convenience and maintainability of separate scripts. > > There /is/ a third alternative, that I came across recently, that I think > is a good idea. If you'd coomment, perhaps it would help me sort out the > implications. > > The idea, simply put, is "bash command theming", single scripts that can > be invoked that will "theme" a command prompt for the tasks at hand. I > didn't read the entire article I saw covering this, but skimmed it enough > to get the gist. A single invokable script for each set of tasks, say > perl programming, bash programming, working with portage, etc, that would > set up a specific set of aliases and functions for that task. Invoking > the script with the "off" parameter would erase that set of aliases and > bash functions, thereby recovering the memory, and do any related cleanup > like resetting the path if necessary to exclude any task specific > commands. Taking this a step further, a variable could be setup that > would list the theme or themes that were active, that the theme-setup > script could read and automatically deactivate the previous theme while > switching to the new one. One could even share functionality between > themes, sourcing common files, which would check the active theme and > adjust their behavior based on the active theme. > > This alias and function theming wouldn't be quite as modular (tho with > sourcing it could be) as the individual scripts, but would maintain the > performance advantages (if any) of the alias/function idea, while at the > same time allowing the memory reclamation of the cached-script option. It > sounds really good, but I'm not yet convinced the benefits would be worth > the additional effort of setting up those themes, since the solution I > have works. > > One VERY NICE benefit of the themes idea is that it would directly > address any namespace pollution concerns. It has a direct appeal to > programmers and anyone else that's ever had to deal with such issues, for > that reason alone. One single command on the path to invoke the theme, > possibly even an eselect-like command shared among themes, with > everything else off-path and out of the namespace unless that theme is > invoked! /VERY/ appealing indeed. OTOH, there are those who'll never > remember the theme they have active at the moment, and be constantly > confused. For these folks, it'd be a nightmare! > > > man emerge: > > --oneshot (-1) > > > > IIRC --oneshot has a short form since 2.0.52 was released. > > Learn new things everyday. Thanks! I remember how pleased I was to have > --newuse, and even more so when I discovered -N, so very nice! > > >> ... Deep breath... <g> > >> > >> All that as a preliminary explanation to this: Along with the above, I > >> have a set of efetch functions, that invoke the -f form, so just do the > >> fetch, not the actual compile and merge, and esyn (there's already an > >> esync function in something or other I have merged so I just call it > >> esyn), which does emerge sync, then updates the esearch db, then > >> automatically fetches all the packages that an eaworld would want to > >> update, so they are ready for me to merge at my leisure. > > > > I'm a bit confused now. You use *functions* to do that? Or do you mean > > scripts? By the way: with alias you could name your custom "script" > > esync because it doesn't place a file on the harddisk. > > Scripts. I was using "functions" in the generic sense here. I did > realize before I sent that it had a dual meaning, but figured it wasn't > important enough a distinction to go back and correct, or explain. > Unfortunately, every time I decide to skip something like that, I get > called on it, which doesn't help my posts get any shorter! =8^) > > >> I choose -Os, optimize for size, because a modern CPU and the various > >> cache levels are FAR faster than main memory. > > > > Given the fact that two CPUs, only differing in L2 Cache size, have > > nearly the same performance, I doubt that the performance increase is > > very big. Some interesting figures: > > > > Athlon64 something (forgot what, but shouldn't matter anyway) with 1 MB > > L2-cache is 4% faster than an Athlon64 of the same frequency but with > > only 512kB L2-cache. The bigger the cache sizes you compare get, the > > smaller the performance increase. Since you run a dual Opteron system > > with 1 MB L2 cache per CPU I tend to say that the actual performance > > increase you experience is about 3%. But then I didn't take into account > > that -Os leaves out a few optimizations which would be included by -O2, > > the default optimization level, which actually makes the code a bit > > slower when compared to -O2. So, the performance increase you really > > experience shrinks to about 0-2%. I'd tend to proclaim that -O2 is even > > faster for most of the code, but that's only my feeling. > > Interesting, indeed. I'd counter that it likely has to do with how many > tasks are being juggled as well, plus the number of kernel/user context > switches, of course. I wonder under what load, and with what task-type, > the above 4% difference was measured. > > Of course, the definitive way to end the argument would be to do some > profiling and get some hard numbers, but I don't think either you or I > consider it an important enough factor in our lives to go to /that/ sort > of trouble. <g> > > > Beside that I should mention that -Os sometimes still has problems with > > huge packages like glibc. > > Interestingly enough, while Gentoo's glibc ebuilds stripflags to -O2, I > did try it with all that stripflags logic disabled. For glibc, it /does/ > seem to slow things down, or did back with gcc-3.3 (IIRC) anyway. I tried > the same glibc both ways. I would have tried tinkering further, but > decided it wasn't worth complicating debugging and the like, since glibc > is loaded by virtually everything, and I'd never be able to tell if it was > my funny tweaks to glibc, or some actual issue with whatever package. > Besides, that's an aweful costly package, in terms of recompile time, not > to mention system stability, to be experimenting with. I /can/ say, > however, that it didn't crash or cause any other issues I could see or > attribute to it. > > OTOH, I haven't tried it with xorg-modular yet, but the monolithic xorg > builds seemed to perform better with -Os. I tried one of them (6.8??) > both ways too. I ended up routinely killing the stripflags logic, but I > was modifying other portions of the ebuild as well (so it compiled only > the ATI video driver, and only installed the 100-dpi fonts, not 75-dpi, > among other things), so that was just one of several modifications I was > making, tho the only real performance affecting one. Performance in X was > better, but it DID take longer to switch to a VT, when I tried that. In > fact, at one point, the switch to VT functionality broke, but someone > mentioned it was broken in general at that point for certain drivers, > anyway, so I'm not sure my optimizations had anything to do with it. > > >> Of course, this is theory, and the practical case can and will differ > >> depending on the instructions actually being compiled. In particular, > >> streaming media apps and media encoding/decoding are likely to still > >> benefit from the traditional loop elimination style optimizations, > >> because they run thru so much data already, that cache is routinely > >> trashed anyway, regardless of the size of your instructions. As well, > >> that type of application tends to have a LOT of looping instructions to > >> optimize! > >> > >> By contrast, something like the kernel will benefit more than usual > >> from size optimization. First, it's always memory locked and as such > >> can't be swapped, and even "slow" main memory is still **MANY** > >> **MANY** times faster than swap, so a smaller kernel means more other > >> stuff fits into main memory with it, and isn't swapped as much. Second, > >> parts of the > > > > Funny to hear this from somebody with 4 GB RAM in his system. I don't > > know how bloated your kernel is, but even if -Os would reduce the size > > of my kernel to **the half**, which is totally impossible, it wouldn't > > be enough to load the mail I am just answering into RAM. So, basically, > > this reasoning is just ridiculous. > > I won't argue with that. BTW, still at a gig, much to my frustration! I > put off upgrading memory when I decided my disk was in danger of going bad > and I ended up deciding to go 4-disk SATA based RAID. Then I upgraded my > stereo near Christmas... Now the CC is almost paid off again, so I'm > looking at that memory upgrade again. > > Much to my frustration, memory prices don't seem to be dropping much > lately! > > > You are referring a lot to the gcc manpage, but obviously you missed > > this part: > > > > -fomit-frame-pointer > > Don't keep the frame pointer in a register for functions that > > don't need one. This avoids the instructions to save, set up > > and restore frame pointers; it also makes an extra register > > available in many functions. It also makes debugging > > impossible on some machines. > > > > On some machines, such as the VAX, this flag has no effect, > > because the standard calling sequence automatically handles > > the frame pointer and nothing is saved by pretending it > > doesn't exist. The machine-description macro > > "FRAME_POINTER_REQUIRED" controls whether a target machine > > supports this flag. > > > > Enabled at levels -O, -O2, -O3, -Os. > > > > I have to say that I am a bit disappointed now. You seemed to be one of > > those people who actually inform themselves before sticking new flags > > into their CFLAGS. > > ?? > > I'm not sure which way you mean this. It was in my CFLAGS list, but I > didn't discuss it as it's fairly common (from my observation, nearly as > common as -pipe) and seems fairly non-controversial on Gentoo. Did you > miss it in my CFLAGS and are saying I should be using it, or did you see > it and are saying its unnecessary and redundant because it's enabled by > the -Os? > > If the latter, yes, but as mentioned above in the context of glibc, -Os is > sometimes stripped. In that case, the redundancy of having the basic > -fomit-frame-pointer is useful, unless it's also stripped, but as I said, > it seems much less controversial than some flags and is often > specifically allowed where most are stripped. > > Or, are you saying I should avoid it due to the debugging implications? I > don't quite get it. > > >> !!! Relying on the shell to locate gcc, this may break !!! DISTCC, > >> installing gcc-config and setting your current gcc !!! profile will fix > >> this > >> > >> Another warning, likewise to stderr and thus not in the eis output. > >> This one is due to the fact that eselect, the eventual systemwide > >> replacement for gcc-config and a number of other commands, uses a > >> different method to set the compiler than gcc-config did, and portage > >> hasn't been adjusted to full compatibility just yet. Portage finds the > >> proper gcc just fine for itself, but there'd be problems if distcc was > >> involved, thus the warning. > > > > Didn't know about this. Have you filed a bug yet on the topic? Or is > > there already one? > > There is one. I don't recall if I filed it or if it was already there, > but both JH and the portage folks know about the issue. IIRC, the portage > folks decided it was their side that needed changed, but that required > changes to the distcc package, and I don't know how that has gone since I > don't use distcc, except that I was slightly surprised to see the warning > in portage 2.1 still. > > >> MAKEOPTS="-j4" > >> > >> The four jobs is nice for a dual-CPU system -- when it works. > >> Unfortunately, the unpack and configure steps are serialized, so the > >> jobs option does little good, there. To make most efficient use of the > >> available cycles when I have a lot to merge, therefore, I'll run as > >> many as five merges in parallel. I do this quite regularly with KDE > >> upgrades like the one to 3.5.1, where I use the split KDE ebuilds and > >> have something north of 100 packages to merge before KDE is fully > >> upgraded. > > > > I really wonder how you would paralellize unpacking and configuring a > > package. > > That's what was nice about configcache, which was supposed to be in the > next portage, but I haven't seen or heard anything about it for awhile, > and the next portage, 2.1, is what I'm using. configcache seriously > shortened that stage of the build, leaving more of it parallelized, but... > > I was using it for awhile, patching successive versions of portage, but it > broke about the time sandbox split, the dev said he wasn't maintaining the > old version since it was going in the new portage, and I tried updating > the patch but eventually ran into what I think were unrelated issues but > decided to drop that in one of my troubleshooting steps and never picked > it up again. > > I'd certainly like to have it back again, tho. If it's working in 2.1, > I've not seen it documented or seen any hints in the emerge output, as > were there before. You seen or heard anything? > > BTW, what is your opinion on -ftracer? Several devs I've noticed use it, > but the manpage says it's not that useful without active profiling, which > means compiling, profiling, and recompiling, AFAIK. It's possible the > devs running it do that, but I doubt it, and otherwise, I don't see that > it should be that useful? I don't know if you run it, but since I've got > your attention, I thought I'd ask what you think about it. Is there > something of significance I'm missing, or are they, or are they actually > doing that compile/profile/recompile thing? It just doesn't make sense to > me. I've seen it in several user posted CFLAGS as well, but I'll bet a > good portion of them are simply because they saw it in a dev's CFLAGS and > decided it looked useful, not because they understand any implications > stated in the manpage. (Not that I always do either, but... <g>) > > -- > Duncan - List replies preferred. No HTML msgs. > "Every nonfree program has a lord, a master -- > and if you use the program, he is your master." Richard Stallman in > http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html -- [email protected] mailing list
