Mike Owen posted <[EMAIL PROTECTED]>, excerpted below, on Thu, 02 Feb 2006 17:12:04 -0800:
> On 2/2/06, Duncan <[EMAIL PROTECTED]> wrote: >> >> http://members.cox.net/pu61ic.1inux.dunc4n/ > > Nice. Now let us know your CFLAGS, and what toolchain versions you're > running :D You probably didn't notice, as I had it commented out on the main index page as I don't have the page created to actually list them yet, but if you viewed source, you'd have seen I have a techspecs page link commented out, that'll get that sort of info, when/if I actually get it created. However, since you asked, your answer, and a bit more, by way of explanation... I should really create a page listing all the little Gentoo admin scripts I've come up with and how I use them. I'm sure a few folks anyway would likely find them useful. The idea behind most of them is to create shortcuts to having to type in long emerge lines, with all sorts of arbitrary command line parameters. The majority of these fall into two categories, ea* and ep*, short for emerge --ask <additional parameters> and emerge --pretend ... . Thus, I have epworld and eaworld, the pretend and ask versions of emerge -NuDv world, epsys and easys, the same for system, eplog <package>, emerge --pretend --log --verbose (package name to be added to the command line so eplog gcc, for instance, to see the changes between my current and the new version of gcc), eptree <package>, to use the tree output, etc. One thing I've found is that I'll often epworld or eptreeworld, then emerge the individual packages, rather than use eaworld to do it. That way, I can do them in the order I want or do several at a time if I want to make use of both CPUs. Because I always use --deep, as I want to keep my dependencies updated as well, I'm very often merging specific dependencies. There's a small problem with that, however --oneshot, which I'll always want to use with dependencies to help keep my world file uncluttered, has no short form, but I use it as the default! OTOH, the normal portage mode of adding stuff listed on the command line to the world file, I don't want very often, as most of the time I'm simply updating what I have, so it's all in the world file if it needs to be there already anyway. Not a problem! All my regular ea* scriptlets use --oneshot, so it /is/ my default. If I *AM* merging something new that I want added to my world file, I have another family of ea* scriptlets that do that -- all ending in "2", as in, "NOT --oneshot". Thus, I have a family of ea*2 scriptlets. The regulars here already know one of my favorite portage features is FEATURES=buildpkg, which I have set in make.conf. That of course gives me a collection of binary versions of packages I've already emerged, so I can quickly revert to an old version for testing something, if I want, then remerge the new version once I've tested the old version to see if it has the same bug I'm working on or not. To aid in this, I have a collection of eppak and eapak scriptlets. Again, the portage default of --usepackage (-k) doesn't fit my default needs, as if I'm using a binpkg, I usually want to ONLY use a binpkg, NOT merge from source if the package isn't available. That happens to be -K in short-form. However, it's my default, so eapak invokes the -K version. I therefore have eapaK to invoke the -k version if I don't really care whether it goes from binpkg or source. Of course, there are various permutations of the above as well, so I have eapak2 and eapaK2, as well as eapak and eapaK. For the ep* versions, of course the --oneshot doesn't make a difference, so I only have eppak and eppaK, no eppa?2 scriptlets. ... Deep breath... <g> All that as a preliminary explanation to this: Along with the above, I have a set of efetch functions, that invoke the -f form, so just do the fetch, not the actual compile and merge, and esyn (there's already an esync function in something or other I have merged so I just call it esyn), which does emerge sync, then updates the esearch db, then automatically fetches all the packages that an eaworld would want to update, so they are ready for me to merge at my leisure. Likewise, and the real reason for this whole explanation, I /had/ an "einfo" scriptlet that simply ran "emerge info". This can be very handy to run, if like me, you have several slotted versions of gcc merged, and you sometimes forget which one you have eselected or gcc-configed as the one portage will use. Likewise, it's useful for checking on CFLAGS (or CXXFLAGS OR LDFLAGS or...), if you modified them from the normal ones because a particular package wasn't cooperating, and you want to see if you remembered to switch them back or not. However, I ran into a problem. The output of einfo was too long to quickly find the most useful info -- the stuff I most often change and therefore most often am looking for. No sweat! I shortened my original "einfo" to simply "ei", and added a second script, "eis" (for einfo short), that simply piped the output of the usual emerge info into a grep that only returned the lines I most often need -- the big title one with gcc and similar info, CFLAGS, CXXFLAGS, LDFLAGS, and FEATURES. USE would also be useful, but it's too long even by itself to be searched at a glance, so if I want it, I simply run ei and look for what I want in the longer output. ... Another deep breath... <g> OK, with that as a preliminary, you should be able to understand the following: $eis Portage 2.1_pre4-r1 (default-linux/amd64/2006.0, gcc-4.1.0-beta20060127, glibc-2.3.6-r2, 2.6.15 x86_64) CFLAGS="-march=k8 -Os -pipe -fomit-frame-pointer -frename-registers -funit-at-a-time -fweb -freorder-blocks-and-partition -fmerge-all-constants" CXXFLAGS="-march=k8 -Os -pipe -fomit-frame-pointer -frename-registers -funit-at-a-time -fweb -freorder-blocks-and-partition -fmerge-all-constants" FEATURES="autoconfig buildpkg candy ccache confcache distlocks multilib-strict parallel-fetch sandbox sfperms strict userfetch" LDFLAGS="-Wl,-z,now" MAKEOPTS="-j4" To make sense of that... * The portage and glibc versions are ~amd64, as set in make.conf for the system in general. * CFLAGS: I choose -Os, optimize for size, because a modern CPU and the various cache levels are FAR faster than main memory. This difference is frequently severe enough that it's actually more efficient to optimize for size than for CPU performance, because the result is smaller code that maintains cache locality (stays in fast cache) far better, and the CPU saves more time that it would otherwise be spending idle, waiting for data to come in from slower more distant memory, than the actual cost of the loss of cycle efficiency that's often the tradeoff for small code. -O3, and to a lessor extent, -O2, do things like turn a loop that executes a fixed number of say 3 times, into "faster" code, by avoiding the jump at the end of each loop back to the top of the loop by writing it out as inline code, copying the loop instructions three times. This process would in our example of a 3-time fixed execution loop, save the expensive jump back to the top of the loop two times -- but at the SAME time would expand that section of code to three times its looped size. Back when memory operated at or near the speed of the CPU, avoiding the loop, even at the expense of three-times the code, was often faster. Today, where CPUs do several calculations in the time it takes to fetch data from main memory, it's generally faster to go for the smaller code, as it will be far more likely to still be in fast cache, avoiding that long wait for main memory, even if it /does/ mean wasting a couple additional cycles doing the expensive jump back to the top of the loop. Of course, this is theory, and the practical case can and will differ depending on the instructions actually being compiled. In particular, streaming media apps and media encoding/decoding are likely to still benefit from the traditional loop elimination style optimizations, because they run thru so much data already, that cache is routinely trashed anyway, regardless of the size of your instructions. As well, that type of application tends to have a LOT of looping instructions to optimize! By contrast, something like the kernel will benefit more than usual from size optimization. First, it's always memory locked and as such can't be swapped, and even "slow" main memory is still **MANY** **MANY** times faster than swap, so a smaller kernel means more other stuff fits into main memory with it, and isn't swapped as much. Second, parts of the kernel such as task scheduling are executed VERY often, either because they are frequently executed by most processes, or because they /control/ those processes. The smaller these are, the more likely they are to still be in cache when next used. Likewise, the smaller they are, the less potentially still useful other data gets flushed out of cache to make room for the kernel code executing at the moment. Third, while there's a lot of kernel code that will loop, and a lot that's essentially streaming, the kernel as a whole is a pretty good mix of code and thus won't benefit as much from loop optimizations and the like, as compared to special purpose code like the media codec and streaming applications above. The differences are marked enough and now demonstrated enough that a kernel config option to optimize for size was added I believe about a year ago. Evidently, that lead to even MORE demonstration, as the option was originally in the obscure embedded optimizations corner of the config, where few would notice or use it, and they upgraded it into a main option. In fact, where a year or two ago, the option didn't even exist, now I believe it defaults to yes/on/do-optimize-for-size (altho it's possible I'm incorrect on the last and it's not yet the default). According to the gcc manpage, -frename-registers causes gcc to attempt to make use of registers left over after normal register allocation. This is particularly beneficial on archs that have many registers (keeping in mind that "registers" are what amounts to L0 cache, the fastest possible memory because the CPU accesses registers directly and they operate at full CPU speed. Unfortunately, registers are also very limited, making them an EXCEEDINGLY valuable resource! Note that while x86-32 is noted for its relative /lack/ of registers, AMD basically doubled the number of registers available to 64-bit code in its x86-64 aka AMD64 spec. Thus, while this option wouldn't be of particular benefit on x86, on amd64, it can, depending on the code of course, provide some rather serious optimization! -fweb is a register use optimizer function as well. It tells gcc to create a /web/ of dependencies and assign each individual dependency web to its own pseudo-register. Thus, when it comes time for gcc to allocate registers, it already has a list of the best candidates lined up and ready to go. Combined with -frename register to tell gcc to efficiently make use of any registers left over after the the first pass, and due to the number of registers available in 64-bit mode on our arch, this can allow some seriously powerful optimizations. Still, a couple of things to note about it. One, -fweb (and -frename-registers as well) can cause data to move out of its "home" register, which seriously complicates debugging, if you are a programmer or power-user enough to worry about such things. Two, the rewrite for gcc 4.0 significantly modified the functionality of -fweb, and it wasn't recommended for 4.0 as it didn't yet work as well as expected or as it did with gcc 3.x. For gcc 4.1, -fweb is apparently back to its traditional strength. Those Gentoo users having gcc 3.4, 4.0, and 4.1, all three in separate slots, will want to note this as they change gcc-configuratiions, and modify it accordingly. Yes, this *IS* one of the reasons my CFLAGS change so frequently! -funit-at-a-time tells gcc to consider a full logical unit, perhaps consisting of several source files rather than just one, as a whole, when it does its compiling. Of course, this allows gcc to make optimizations it couldn't see if it wasn't looking at the larger picture as a whole, but it requires rather more memory, to hold the entire unit so it can consider it at once. This is a fairly new flag, introduced with gcc 3.3 IIRC. While the idea is simple enough and shouldn't lead to any bugs on its own, there WERE a number of initially never encountered bugs in various code that this flag exposed, when GCC made optimizations on the entire unit that it wouldn't otherwise make, thereby triggering bugs that had never been triggered before. I /believe/ this was the root reason why the Gentoo amd64 technotes originally discouraged use of -Os, back with the first introduction of this flag in gcc 3.2 hammer (amd64) edition, as -funit-at-a-time was activated by -Os at that time, and -Os was known to produce bad code at the time, on amd64, with packages like portions of KDE. The gcc 4.1.0 manpage now says it's enabled by default at -O2 and -O3, but doesn't mention -Os. Whether that's an omission, or whether they decided it shouldn't be enabled by -Os for some reason, I'm not sure, but I use them both to be sure and haven't had any issues I can trace to this (not even back when the technotes recommended against -Os, and said KDE was supposed to have trouble with it -- maybe it was parts of KDE I never merged, or maybe I was just lucky, but I've simply never had an issue with it). -freorder-blocks-and-partition is new for gcc 4.0, I believe, alto I didn't discover it until I was reading the 4.1-beta manpage. I KNOW gcc 3.4.4 fails out with it, saying unrecognized flag or some such, so it's another of those flags that cause my CFLAGS to be constantly changing, as I switch between gcc versions. This flag won't work under all conditions, according to the manpage, so is automatically disabled in the presence of exception handling, and a few other situations named in the manpage. It causes a lot of warnings too, to the effect that it's being disabled due to X reason. There's a similar -freorder-blocks flag, which optimizes by reordering blocks in a function to "reduce number of taken branches and improve code locality." In English, what that means is that it breaks caching less often. Again, caching is *EXTREMELY* performance critical, so anything that breaks it less often is CERTAINLY welcome! The -and-partition increases the effect, by separating the code into frequently used and less frequently used partitions. This keeps the most frequently used code all together, therefore keeping it in cache far more efficiently, since the less used code won't be constantly pulled in, forcing out frequently used code in the process. Hmm... As I'm writing and thinking about this, the probability that sticking the regular -freorder-blocks option in CFLAGS as well would be a wise thing, occurs to me. The non-partition version isn't as efficient as the partition version, and would be redundant if the partitioned version is in effect. However, the non-partitioned version doesn't have the same sorts of no-exceptions-handler and similar restrictions, so having it in the list, first, so the partitioned version overrides it where it can be used, should be a good idea. That way, where the partitioned version can be used, it will be, but where it can't, gcc will still use the non-partitioned version of the option, so I'll still get /some/ of the optimizations! I (re)compiled major portions of xorg (modular), qt, and the new kde 3.5.1 with the partitioned option, however, and it works, and I haven't tested having both options in there yet, so I'm not sure it'll work as the theory suggests it should, so some caution might be advised. -fmerge-all-constants COULD be dangerous with SOME code, as it breaks part of the C/C++ specification. However, it should be fine for most code written to be compiled with gcc, and I've seen no problems /yet/ tho both this and the reorder-and-partition flag above are fairly new to my CFLAGS, so haven't been as extensively personally tested as the others have been. If something seems to be breaking when this is in your CFLAGS, certainly it's the first thing I'd try pulling out. What it actually does is merge all constants with the same value into the same one. gcc has a weaker -fmerge-constants version that's enabled with any -O option at all (thus at -O, -O2, -O3, AND -Os), that merges all declared constants of the same value, which is safe and doesn't conflict with the C/C++ spec. What the /all/ specifier in there does, however, is cause gcc to merge declared variables where the value actually never changes, so they are in effect constants, altho they are declared as variables, with other constants of the same value. This /should/ be safe, /provided/ gcc isn't failing to detect a variable chance somewhere, but it conflicts with the C/C++ spec, according to the gcc manpage, and thus /could/ cause issues, if the developer pulls certain tricks that gcc wouldn't detect, or possibly more likely, if used with code compiled by a different compiler (say binary-only applications you may run, which may not have been compiled with gcc). There are two reasons why I choose to use it despite the possible risks. One, I want /small/ code, again, because small code fits in that all-important cache better and therefore runs faster, and obviously, two or more merged constants aren't going to take the space they would if gcc stored them separately. Two, the risks aren't as bad if you aren't running non-gcc compiled code anyway, and since I'm a strong believer in Software Libre, if it's binary-only, there's very little chance I'll want or risk it on my box, and everything I do run is gcc compiled anyway, so should be generally safe. Still, I know there may be instances where I'll have to recompile with the flag turned off, and am prepared to deal with them when they happen, or I'd not have the flag in my CFLAGS. And, here's some selected output from ei, interspersed with explanations, since I'm editing the output anyway: $ei !!! Failed to change nice value to '-2' !!! [Errno 13] Permission denied This is stderr output. It's not in the eis output above because I redirect stderr to /dev/null for it, as I know the reason for the error and am trying to be brief. The warning is because I'm using PORTAGE_NICENESS=-2 in make.conf. It has a negative nice set there to encourage portage to make fuller use of the dual CPUs under-X/from-a-konsole-session, as X and the kernel do some dynamic scheduling magic to keep X more responsive without having to up /its/ priority. The practical effect of that "magic" is to lower the priorities of everything besides X slightly, when X is running. This /does/ have the intended effect of keeping X more responsive, but the cost as observed here is that emerges take longer than they should when X is running, because the scheduler is leaving a bit of extra idle CPU time to keep X responsive. In many cases, I'd rather be using maximum CPU and get the merges done faster, even if X drags a bit in the mean time, and the slightly negative niceness for portage accomplishes exactly that. It's reporting a warning (to stderr) here, as I ran the command as a regular non-root user, and non-root can't set negative priorities for obvious system security reasons. I get the same warning with my ep* commands, which I normally run as a regular user, as well. The ea* commands which actually do the merging get run as root, naturally, so the niceness /can/ be set negative when it counts, during a real emerge. So... nothing of any real matter, then. !!! Relying on the shell to locate gcc, this may break !!! DISTCC, installing gcc-config and setting your current gcc !!! profile will fix this Another warning, likewise to stderr and thus not in the eis output. This one is due to the fact that eselect, the eventual systemwide replacement for gcc-config and a number of other commands, uses a different method to set the compiler than gcc-config did, and portage hasn't been adjusted to full compatibility just yet. Portage finds the proper gcc just fine for itself, but there'd be problems if distcc was involved, thus the warning. Again, I'm aware of the situation and the cause, but don't use distcc, so it's nothing I have to worry about, and I can safely ignore the warning. I kept the warnings here, as I find them and the explanation behind them interesting elements of my Gentoo environment, thus worth posting, for others who seem interested in my Gentoo environment as well. If nothing else, the explanations should help some in my audience understand that bit more about how their system operates, even if they don't get these warnings. Portage 2.1_pre4-r1 (default-linux/amd64/2006.0, gcc-4.1.0-beta20060127, glibc-2.3.6-r2, 2.6.15 x86_64) ================================================================= System uname: 2.6.15 x86_64 AMD Opteron(tm) Processor 242 Gentoo Base System version 1.12.0_pre15 Those of you running stable amd64, but wondering where baselayout is for unstable, there you have it! ccache version 2.4 [enabled] dev-lang/python: 2.4.2 sys-apps/sandbox: 1.2.17 sys-devel/autoconf: 2.13, 2.59-r7 sys-devel/automake: 1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r1 sys-devel/binutils: 2.16.91.0.1 sys-devel/libtool: 1.5.22 virtual/os-headers: 2.6.11-r3 ACCEPT_KEYWORDS="amd64 ~amd64" Same for the above portions of my toolchain. AFAIR, it's all ~amd64, altho I was running a still-masked binutils for awhile shortly after gcc-4.0 was released (still-masked on Gentoo as well), as it required the newer binutils. LANG="en_US" LDFLAGS="-Wl,-z,now" Some of you may have noticed the occasional Portage warning about a SETUID executables using lazy bindings, and the potential security issue that causes. This setting for LDFLAGS forces early bindings with all dynamically linked libraries. Normally it'd only be necessary or recommended for SETUID executables, and set in the ebuild where it's safe to do so, but I use it by default, for several reasons. The effect is that a program takes a bit longer to load initially, but won't have to pause to resolve late bindings as they are needed. You're trading waiting at executable initialization for waiting at some other point. With a gig of memory, I find most stuff I run more than once is at least partially still in cache on the second and later launches, and with my system, I don't normally find the initial wait irritating, and sometimes find a pause after I'm working with a program especially so, so I prefer to have everything resolved and loaded at executable launch. Additionally, with lazy bindings, I've had programs start just fine, then fail later when they need to resolve some function that for some reason won't resolve in whatever library it's supposed to be coming from. I don't like have the thing fail and interrupt me in the middle of a task, and find it far less frustrating, if it's going to fail when it tries to load something, to have it do so at launch. Because early bindings forces resolution of functions at launch, if it's going to fail loading one, it'll fail at launch, rather than after I've started working with the program. That's /exactly/ how I want it, so that's why I run the above LDFLAGS setting. It's nice not to have to worry about the security issue, but SETUID type security isn't as critical on my single-human-user system, where that single-user-is me and I already have root when I want it anyway, as it'd be in a multi-user system, particularly a public server, so the other reasons are more important than security, for me, on this. They just happen to coincide, so I'm a happy camper. =8^) The caveat with these LDFLAGS, however, is the rare case where there's a circular functional dependency that's normally self-resolving, Modular xorg triggers one such case, where the monolithic xorg didn't. There are three individual ebuilds related to modular xorg that I have to remove these LDFLAGS for or they won't work. xorg-server is one. xf86-vidio-ati, my video driver, is another. libdri was the third, IIRC. There's a specific order they have to be compiled in, as well. If they are compiled with this enabled, they, and consequently X, refuses to load (tho X will load without DRI, if that's the only one, it'll just protest in the log and DRI and glx aren't available). Evidently there's a non-critical fourth module somewhere, that still won't load properly due to an unresolved symbol, that I need to track down and remerge without these LDFLAGS, and that's what's keeping GLX from loading on my current system, as mentioned in an earlier post. LINGUAS="en" MAKEOPTS="-j4" The four jobs is nice for a dual-CPU system -- when it works. Unfortunately, the unpack and configure steps are serialized, so the jobs option does little good, there. To make most efficient use of the available cycles when I have a lot to merge, therefore, I'll run as many as five merges in parallel. I do this quite regularly with KDE upgrades like the one to 3.5.1, where I use the split KDE ebuilds and have something north of 100 packages to merge before KDE is fully upgraded. I mentioned above that I often run eptree, then ea individual packages from the list. This is how I accomplish the five merges in parallel. I'll take a look at the tree output to check the dependencies, and merge the packages first that have several dependencies, but only where those dependencies aren't stepping on each other, thus keeping the parallel emerges from interfering with each other, because each one is doing its own dependencies, that aren't dependencies of any of the others. After I get as many of those going as I can, I'll start listing 3-5 individual packages without deps on the same ea command line. By the time I've gotten the fifth one started, one of the other sessions has usually finished or is close to it, so I can start it merging the next set of packages. With five merge sessions in parallel, I'm normally running an average load of 5 to 9, meaning that many applications are ready for CPU scheduling time at any instant, on average. If the load drops below four, there's proobably idle CPU cycles being wasted that could otherwise be compiling stuff, as each CPU needs at least one load-point to stay busy, plus usually can schedule a second one for some cycles as well, while the first is waiting for the hard drive or whatever. (Note that I'm running a four-drive RAID, RAID-6, so two-way striped, for my main system, Raid-0, so 4-way striped, for $PORTAGE_TMPDIR, so hard drive latency isn't /nearly/ as high as it would be on a single-hard-drive system. Of course, running five merges in parallel /does/ increase disk latency some as well, but it /does/ seem to keep my load-average in the target zone and my idle cycles to a minimum, during the merge period. Also note that I've only recently added the PORTAGE_NICENESS value above, and haven't gotten it fully tweaked to the best balance between interactivity and emerge speed just yet, but from observations so far, with the niceness value set, I'll be able to keep the system busy with "only" 3-4 parallel merges, rather than the 5 I had been having to run to keep the system most efficiently occupied when I had a lot to merge.) PKGDIR="/pkg" PORTAGE_TMPDIR="/tmp" PORTDIR="/p" PORTDIR_OVERLAY="/l/p" Here you can see some of my path customization. USE="amd64 7zip X a52 aac acpi alsa apm arts asf audiofile avi bash-completion berkdb bitmap-fonts bzip2 caps cdparanoia cdr crypt css cups curl dga divx4linux dlloader dri dts dv dvd dvdr dvdread eds emboss encode extrafilters fam fame ffmpeg flac font-server foomaticdb gdbm gif glibc-omitfp gpm gstreamer gtk2 idn imagemagick imlib ithreads jp2 jpeg jpeg2k kde kdeenablefinal lcms libwww linuxthreads-tls lm_sensors logitech-mouse logrotate lzo lzw lzw-tiff mad maildir mikmod mjpeg mng motif mozilla mp3 mpeg ncurses network no-old-linux nolvm1 nomirrors nptl nptlonly offensive ogg opengl oss pam pcre pdflib perl pic png ppds python qt quicktime radeon readline scanner slang speex spell ssl tcltk theora threads tiff truetype truetype-fonts type1 type1-fonts usb userlocales vcd vorbis xcomposite xine xinerama xml2 xmms xosd xpm xrandr xv xvid yv12 zlib elibc_glibc input_devices_keyboard input_devices_mouse kernel_linux linguas_en userland_GNU video_cards_ati" My USE flags, FWTAR (for what they are worth). Of particular interest are the input_devices_mouse and keyboard, and video_cards_ati. These come from variables (INPUT_DEVICES and VIDEO_CARDS) set in make.conf, and used in the new xorg-modular ebuilds. These and the others listed after zlib are referred to by Gentoo devs as USE_EXPAND. Effectively, they are USE flags in the form of variables, setup that way because there are rather many possible values for those variables, too many to work as USE flags. The LINGUAS and LANG USE_EXPAND variables are prime examples. Consider how many different languages there are and that were used and documented as regular USE flags, it would have to be in use.local.desc, because few supporting packages would offer the same choices, so each would have to be listed separately for each package. Talk about the number of USE flags quickly getting out of control! Unset: ASFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, LC_ALL OK, some loose ends to wrapup, and I'm done. re: gcc versions: The plan is for gcc-4.0 to go ~arch fairly soon, now. The devs are actively asking for bug reports involving it, now, so as many as possible can be resolved before it goes ~arch. (Formerly, they were recommending that bugs be filed upstream, and not with Gentoo unless there was a patch attached, as it was considered entirely unsupported, just there for those that wanted it anyway.) At this point, nearly everything should compile just fine with 4.0. That said, Gentoo has slotted gcc for a reason. It's possible to have multiple minor versions (3.3, 3.4, 4.0, 4.1) merged at the same time. With USE=multislot, that's actually microversion (4.0.0, 4.0.1, 4.0.2...). Using either gcc-config or eselect compiler, and discounting any CFLAG switching you may have to do, it's a simple matter to switch between merged versions. This made it easy to experiment with gcc-4.0 even tho Gentoo wasn't supporting it and certain packages wouldn't compile with 4.x, because it was always possible to switch to a 3.x version if necessary, and compile the package there. I did this quite regularly, using gcc-4.0 as my normal version, but reverting for individual packages as necessary, when they wouldn't compile with 4.0. The same now applies to the 4.1.0-beta-snapshot series. Other than the compile time necessary to compile a new gcc when the snapshot comes out each week, it's easy to run the 4.1-beta as the main system compiler for as wide testing as possible, while reverting to 4.0 or 3.4 (I don't have a 3.3 slot merged) if needed. re: the performance improvements I saw that started this whole thing: These trace to several things, I believe. #1, with gcc-4.0, there's now support for -fvisibility -- setting certain functions as exported and visible externally, others not. That can easily cut exported symbols by a factor of 10. Exported symbols of course affect dynamic load-time, which of course gets magnified dramatically by my LDFLAGS early binding settings. When I first compiled KDE with that (there were several missteps early on in terms of KDE and Gentoo's support, but that aside), KDE appload times went down VERY NOTICEABLY! Again, due to my LDFLAGS, the effect was multiplied dramatically, but the effect is VERY real! Of course, that's mainly load-time performance. The run-time performance that we are actually talking here has other explanations. A big one is that gcc-4 was a HUGE rewrite, with a BIG potential to DRAMATICALLY improve gcc's performance. With 4.0, the theory is there, but in practice, it wasn't all that optimized just yet. In some ways it reverted behavior below that of the fairly mature 3.x series, altho the rewrite made things much simpler and less prone to error given its maturity. 4.1, however, is the first 4.x release to REALLY be hitting the potential of the 4.x series, and it appears the difference is very noticeable. Of course, there's a reason 4.1.0 is still in beta upstream and not supported by Gentoo either, as there are still known regressions. However, where it works, which it seems to do /most/ of the time, it **REALLY** works, or at least that's been my observation. 3.3 was a MAJOR improvement in gcc for amd64 users, because it was the first version where amd64 wasn't simply an add-on hack, as it had been with 3.2. The 3.4 upgrade was minor in comparison, and 4.0 while it's going ~arch shortly, and sets the stage for a lot of future improvement, will be pretty minor in terms of actual improved performance as well. 4.1, however, when it is finally fully released, has the potential to be as big an improvement as 3.3 was -- that is, a HUGE one. I'm certainly looking forward to it, and meanwhile, running the snapshots, because Gentoo makes it easy to do so while maintaining the ability to switch very simply between multiiple versions on the system. Both -freorder-blocks-and-partition and -fmerge-all-constants are new to me within a few days, now, and new to me with kde 3.5.1. Normally, individual flags won't make /that/ much of a difference, but it's possible I hit it lucky, with these. Actually, because they both match very well with and reinforce my strategy of targeting size, it's possible I'm only now unlocking the real potential behind size optimization. -- I **KNOW** there's a **HUGE** difference in sizes between resulting file-sizes. I compared 4.0.2 and 4.1.0-beta-snapshot file sizes for several modular-X files in the course of researching the missing symbols problem, and the difference was often a shrinkage of near 33 percent with 4.1 and my current CFLAGS as opposed to 4.0.1 without the new ones. Going the other way, that's a 50% larger file with 4.0.2 as compared to 4.1, 100KB vs 150KB, by way of example. That's a *HUGE* difference, one big enough to initially think I'd found the reason for the missing symbols right there, as the new files were simply too much smaller to look workable! Still, I traced the problem too LDFLAGS, so that wasn't it, and the files DO work, confirming things. I'm guessing -fmerge-all-constants plays a significant part in that. In any case, with that difference in size, and knowing how /much/ cache hit vs. miss affects performance, it's quite possible the size is the big performance factor. Of course, even if that's so, I'm not sure whether it is the CFLAGS or the 4.0 vs 4.1 that should get the credit. In any case, I'm a happy camper right now! =8^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman in http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html -- [email protected] mailing list
