Daniel Iliev <[EMAIL PROTECTED]> posted [EMAIL PROTECTED], excerpted below, on Wed, 27 Sep 2006 08:50:03 +0300:
> So let me start a with 2 newbie questions caused by my first impressions > from the x86_64 world: > > 1) I use CFLAGS="-march=athlon64 -mfpmath=sse -msse -msse2 -msse3 > -m3dnow -mmmx -O3 -fomit-frame-pointer -pipe -fpic". Portage complains > with *red letters* about the fpic flag. Every time I emerge something it > says that "fpic breaks things", but I haven't met a single breakage so > far. Is that a bug? Actually there was an ebuild which could not be > compiled if mysql was compiled w/o "fpic". I'm not 100% sure but AFAIR > it was dev-perl/DBD-mysql. > > 2) I see too many flags that are disabled by the profile - the kind with > the parenthesis around them, like "(-3dnow)". Why? As I mentioned above > I enable some of these through my CFLAGS - e.g. (-mmx), (-mmxext), > (-sse) and (-sse2) and everything works perfect. It seems that you missed some of the Gentoo/AMD64 documentation. Many/most of your questions are answered there. Unfortunately, I'm not aware of a simple easy to use list of everything in one spot, so it's reading a bit of documentation here, a bit more there, etc. The main Gentoo/AMD64 project page. (This would be the logical place for such a list, but it's more the project page, tho it links some of the docs, it's just not as easy to find those links as it could be.) http://amd64.gentoo.org Gentoo/AMD64 FAQ: http://www.gentoo.org/doc/en/gentoo-amd64-faq.xml Gentoo/AMD64 HOWTOs. (There's one on -fPIC here, tho the explanation is a bit developer-centric.) http://www.gentoo.org/proj/en/base/amd64/howtos/index.xml A brief direct answer to your questions follows: * The sse etc CFLAGS are arch dependent. Unlike x86 where the mmx/sse/other-extensions instructions were added as the arch matured, on amd64, they are part of the definition of the arch itself. All x86_64 (amd64) CPUs will have mmx/sse/sse2, etc. Thus, -march=athlon64 already tells gcc these are available to use where it wants/needs to. The others don't therefore provide gcc any more information than what it already has. * -fomit-frame-pointer isn't needed on 64-bit amd64 either, as it's turned on for all -O levels on archs (including amd64) where doing so doesn't interfere with debugging. (See the gcc manpage, under -O optimization.) You may wish to continue to specify it for stuff that's compiled for 32-bit, however, including parts of gcc, a version of glibc, a version of the (portage) sandbox library, etc. * Generally speaking, -fPIC is required on amd64 for ALL LIBRARIES but the ebuilds normally take care of it. Under certain circumstances (like unsupported CFLAGS), the configure scripts will turn it off by mistake, see the above mentioned -fPIC HOWTO link for details, but the solution isn't to add it to your CFLAGS, as that means it will be used for executable applications as well as libraries, and /some/ applications /do/ break with it. Not many, but some, and if it's in your CFLAGS, you WILL have bugs you file closed as INVALID or the like, due to CFLAG abuse. If there's something not working without it, then THAT'S a bug and should be filed as such (unless it's due to use of CFLAGS gcc doesn't support and warns about, thus triggering the configure script detection problem discussed above and in the HOWTO). * The profile "disabled" USE flags are simply hard-locked either on or off by the profile, so aren't a USE flag option. It does NOT mean whatever the USE flag controls is actually disabled. Sometimes, as with the multilib USE flag, it can mean it's /enabled/. It just means that the profile is set up to control it, generally for a pretty good reason. In the particular cases you mention, the way Gentoo uses the SSE and similar USE flags is 32-bit specific, enabling 32-bit specific assembler code in the ebuild, for instance. As already mentioned, the AMD64 arch by definition already has these features activated, so no 64-bit USE flags are necessary, and enabling the 32-bit USE flags will cause breakage since it activates 32-bit specific code in many instances. Thus the amd64 profiles have a /very/ good reason to hard-lock these USE flags "off". An example where a USE flag is hard-locked ON by a profile would be multilib. The normal AMD64 profiles are all multilib and thus lock this flag ON (tho it's still shown as disabled), while 64-bit-only profiles lock it OFF. A couple of other notes: Portage now supports per-package CFLAGS and certain other variables as controlled by the environment (as long as they are used in an ebuild.sh phase, not the python phase, since execution is via a bashrc hook). Create /etc/portage/env/<category> as a directory, populated with package or package-version files. The contents of these files will be sourced into the ebuild.sh execution environment for every phase that uses ebuild.sh. CFLAGS and similar variables as found in these files REPLACE (that is, they don't add to, they replace entirely) the default make.conf CFLAGS. You can use this mechanism to specify specific CFLAGS for specific packages, and could thus set -fomit-frame-pointer and other 32-bit x86 specific CFLAGS here if desired, avoiding them in your regular make.conf. You may wish to read a bit of the archives for this list, in particular, the recent threads on gcc 4.1.1 CFLAGS, where I discuss mine. Specifically, it's likely -O3 is actually /worse/ performing in many instances than -O2 or even -Os (my choice). The reasoning is this: CPU cycles are fairly cheap in a modern processor, while the expense of waiting on main memory in the case of a cache miss is MUCH HIGHER, due to the fact that main memory is clocked so much slower than cache. Smaller code fits in cache better and is thus often faster than larger code, even when the smaller code isn't as theoretically CPU cycle efficient. While there will certainly be certain applications where -O3 is beneficial, I believe if you do actual comparisons, you will find -O2 or -Os faster on a system-wide basis. Of course, it's up to you and much virtual ink has been spilled discussing this issue, but that's just my take on things. If you've actually done speed comparisons on AMD64 or can point to some, I'd certainly be interested, as I've honestly not cared enough about it to do my own, but that's my general take in the absence of specific hard data to the contrary. Rather than optimizing for CPU cycles (-O3), I choose to optimize for better register usage (registers being at full CPU speed, therefore faster even than L1 cache, -frename-registers and etc) size (-Os, disabling loop unrolling), whole and multiple unit optimization (-funit-at-a-time, -combine) and hot/cold partitioning (-freorder-blocks-and-partition, tho it can't be used on C++ code, etc). A few of my flags fail on a very few specific packages, another use for the package specific CFLAGS stuff above. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- [email protected] mailing list
