Re: [gentoo-user] CFLAGS CPU optimization question.
Andreas Fredriksson wrote: On 5/29/05, Digby Tarvin [EMAIL PROTECTED] wrote: On the subject of CPU flags, anyone tried optimizing gentoo for a Toshiba Libretto (110CT)? model name : Mobile Pentium MMX flags : fpu vme de pse tsc msr mce cx8 mmx This is indeed a classic pentium chip with mmx added. You can use -mcpu=pentium (or -march=pentium), optionally adding the mmx USE flag for those packages that support it. Actually, since it has MMX, use {-mcpu/-mtune/-march}=pentium-mmx. Worked for me. -- Colin -- gentoo-user@gentoo.org mailing list
Re: [gentoo-user] CFLAGS CPU optimization question.
Andreas Fredriksson wrote: On 5/29/05, Digby Tarvin [EMAIL PROTECTED] wrote: On the subject of CPU flags, anyone tried optimizing gentoo for a Toshiba Libretto (110CT)? model name : Mobile Pentium MMX flags : fpu vme de pse tsc msr mce cx8 mmx This is indeed a classic pentium chip with mmx added. You can use -mcpu=pentium (or -march=pentium), optionally adding the mmx USE flag for those packages that support it. Actually, since it has MMX, use {-mcpu/-mtune/-march}=pentium-mmx. Worked for me. Additionally, add the CPU flags to your USE flags (especially mmx). -- Colin -- gentoo-user@gentoo.org mailing list
Re: [gentoo-user] CFLAGS CPU optimization question.
On the subject of CPU flags, anyone tried optimizing gentoo for a Toshiba Libretto (110CT)? how do I determine which of the stage3 installation files: stage3-athlon-xp-2005.0.tar.bz2 stage3-i686-2005.0.tar.bz2 stage3-pentium3-2005.0.tar.bz2 stage3-pentium4-2005.0.tar.bz2 stage3-x86-2005.0.tar.bz2 is appropriate? 'uname -m' (as suggested in the docs) reports i586 so I assume it is pentium, but where is the boundry between P3 and P4? Also, what would be the best CPUFLAGS where /proc/cpuinfo (under SuSE 7.3) reports the following CPU information: processor : 0 vendor_id : GenuineIntel cpu family : 5 model : 8 model name : Mobile Pentium MMX stepping: 1 cpu MHz : 233.292 fdiv_bug: no hlt_bug : no f00f_bug: yes coma_bug: no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr mce cx8 mmx bogomips: 465.30 Is there anything else I need to check to work out my optimum settings? Thanks, DigbyT -- Digby R. S. Tarvin [EMAIL PROTECTED] http://www.digbyt.com -- gentoo-user@gentoo.org mailing list
Re: [gentoo-user] CFLAGS CPU optimization question.
On 5/29/05, Digby Tarvin [EMAIL PROTECTED] wrote: how do I determine which of the stage3 installation files: stage3-athlon-xp-2005.0.tar.bz2 stage3-i686-2005.0.tar.bz2 stage3-pentium3-2005.0.tar.bz2 stage3-pentium4-2005.0.tar.bz2 stage3-x86-2005.0.tar.bz2 is appropriate? 'uname -m' (as suggested in the docs) reports i586 so I assume it is pentium, but where is the boundry between P3 and P4? P3 and P4 are not i586. stage3-x86-2005.0.tar.bz2 is the only stage3 tarball suitable for i586. If you try another one, your system will simply not run at all. Julien. -- gentoo-user@gentoo.org mailing list
Re: [gentoo-user] CFLAGS CPU optimization question.
On 5/29/05, Digby Tarvin [EMAIL PROTECTED] wrote: On the subject of CPU flags, anyone tried optimizing gentoo for a Toshiba Libretto (110CT)? model name : Mobile Pentium MMX flags : fpu vme de pse tsc msr mce cx8 mmx This is indeed a classic pentium chip with mmx added. You can use -mcpu=pentium (or -march=pentium), optionally adding the mmx USE flag for those packages that support it. // Andreas -- gentoo-user@gentoo.org mailing list
Re: [gentoo-user] CFLAGS CPU optimization question.
On Monday 23 May 2005 05:09 pm, Colin [EMAIL PROTECTED] wrote: -O3: The highest performance optimization level before code starts to break. It goes up to -O9 if you're daring. (Use -Os to compile for size.) Implies a lot of stuff. Ack! What? It does *not* go up to -O9 and never has. Currently, the code change anything after -O3. There used to be an -O4, and I think up to -O6 in private builds, but anything higher than -O3 won't help anymore. (It didn't really /help/ before, more often than not it simply produced seeg-faulting executables.) You can specify -O9, but I don't think that's actually a limit. I think you can but any value there that's recognized by aoti (or maybe one of the strto{,u}{l,ll} family). Use -O69 and see how fast you bugs get marked INVALID at bugs.gentoo.org! Also, I *think* -O3 is still broken on some architectures. x86 should support it fine, but -O3 is in that group of compiler flags that has produced broken executables. (That said, I run with -O3 on a pentinum2 and am quite happy.) -- Boyd Stephen Smith Jr. [EMAIL PROTECTED] ICQ: 514984 YM/AIM: DaTwinkDaddy -- gentoo-user@gentoo.org mailing list
Re: [gentoo-user] CFLAGS CPU optimization question.
Optimization level 9 (-O9)? Thats a laugh. Read the GCC man page, the optimization levels are just groupings of other optimization flags (-O1, -O2, -O3, -O0, -Os), with optimization level 3 (-O3) containing the most optimization flags. The numbers don't correlate to any kind of optimization level (i.e. -O99 wouldn't be 99% optimized or some equivilent malarky). The numbers might as well be letters like A, B, C In fact, if you know anything about programming, most of the -O3 flags and beyond become very code specific, and sometimes only work with programs written a certain way or run on certain processors.This just goes towards my general opinion that Gentoo users in general see gcc cflags as some kind of magic incantation and no little about their purpose, potential, or meaning. -funroll-loops anyone? -Ryan Lynch On Monday 23 May 2005 05:09 pm, Colin [EMAIL PROTECTED] wrote: On Monday 23 May 2005 05:09 pm, Colin [EMAIL PROTECTED] wrote: -O3: The highest performance optimization level before code starts to break. It goes up to -O9 if you're daring. (Use -Os to compile for size.) Implies a lot of stuff. -- gentoo-user@gentoo.org mailing list
Re: [gentoo-user] CFLAGS CPU optimization question.
Optimization level 9 (-O9)? Thats a laugh. Read the GCC man page, the optimization levels are just groupings of other optimization flags (-O1, -O2, -O3, -O0, -Os), with optimization level 3 (-O3) containing the most optimization flags. The numbers don't correlate to any kind of optimization level (i.e. -O99 wouldn't be 99% optimized or some equivilent malarky). The numbers might as well be letters like A, B, C In fact, if you know anything about programming, most of the -O3 flags and beyond become very code specific, and sometimes only work with programs written a certain way or run on certain processors.This just goes towards my general opinion that Gentoo users in general see gcc cflags as some kind of magic incantation and no little about their purpose, potential, or meaning. -funroll-loops anyone? -Ryan Lynch On Monday 23 May 2005 05:09 pm, Colin [EMAIL PROTECTED] wrote: On Monday 23 May 2005 05:09 pm, Colin [EMAIL PROTECTED] wrote: -O3: The highest performance optimization level before code starts to break. It goes up to -O9 if you're daring. (Use -Os to compile for size.) Implies a lot of stuff. -- gentoo-user@gentoo.org mailing list
Re: [gentoo-user] CFLAGS CPU optimization question.
Optimization level 9 (-O9)? Thats a laugh. Read the GCC man page, the optimization levels are just groupings of other optimization flags (-O1, -O2, -O3, -O0, -Os), with optimization level 3 (-O3) containing the most optimization flags. The numbers don't correlate to any kind of optimization level (i.e. -O99 wouldn't be 99% optimized or some equivilent malarky). The numbers might as well be letters like A, B, C In fact, if you know anything about programming, most of the -O3 flags and beyond become very code specific, and sometimes only work with programs written a certain way or run on certain processors.This just goes towards my general opinion that Gentoo users in general see gcc cflags as some kind of magic incantation and no little about their purpose, potential, or meaning. -funroll-loops anyone? -Ryan Lynch On Monday 23 May 2005 05:09 pm, Colin [EMAIL PROTECTED] wrote: On Monday 23 May 2005 05:09 pm, Colin [EMAIL PROTECTED] wrote: -O3: The highest performance optimization level before code starts to break. It goes up to -O9 if you're daring. (Use -Os to compile for size.) Implies a lot of stuff. -- gentoo-user@gentoo.org mailing list
Re: [gentoo-user] CFLAGS CPU optimization question.
Ok already, we hear you. No need to post the same message 5 times. And BTW, it is a feature of GMail that you don't see your own posts. Cheers, -Richard Ryan Lynch wrote: Optimization level 9 (-O9)? Thats a laugh. Read the GCC man page, the optimization levels are just groupings of other optimization flags (-O1, -O2, -O3, -O0, -Os), with optimization level 3 (-O3) containing the most optimization flags. The numbers don't correlate to any kind of optimization level (i.e. -O99 wouldn't be 99% optimized or some equivilent malarky). The numbers might as well be letters like A, B, C In fact, if you know anything about programming, most of the -O3 flags and beyond become very code specific, and sometimes only work with programs written a certain way or run on certain processors.This just goes towards my general opinion that Gentoo users in general see gcc cflags as some kind of magic incantation and no little about their purpose, potential, or meaning. -funroll-loops anyone? -Ryan Lynch On Monday 23 May 2005 05:09 pm, Colin [EMAIL PROTECTED] wrote: On Monday 23 May 2005 05:09 pm, Colin [EMAIL PROTECTED] wrote: -O3: The highest performance optimization level before code starts to break. It goes up to -O9 if you're daring. (Use -Os to compile for size.) Implies a lot of stuff. -- gentoo-user@gentoo.org mailing list
Re: [gentoo-user] CFLAGS CPU optimization question.
* On Tue May-24-2005 at 01:08:51 AM +0200, Julien Cayzac said: On 5/24/05, Richard Fish [EMAIL PROTECTED] wrote: [ recommandations about performance cflags ] While we're at optimizing stuff, here are my CFLAGS (athlon-xp mobile, barton core): CFLAGS=-O2 -march=athlon-xp -msse -mfpmath=sse -pipe -finline-functions -fsched2-use-superblocks -fsched2-use-traces -fmove-all-movables -frename-registers -fweb -ffast-math -funsafe-math-optimizations -fprefetch-loop-arrays -fforce-addr -momit-leaf-frame-pointer -ftracer -funit-at-a-time -maccumulate-outgoing-args 2.6.10-gentoo-r6 kernel, everything is stable... and far faster than when I had only -O2 -march=athlon-xp -pipe :-) I recommand [sic] that you slow down. -- Sami Samhuri pgp7ep95rGmLO.pgp Description: PGP signature
Re: [gentoo-user] CFLAGS CPU optimization question.
Walter Dnes wrote: Currently, I use -march=i686 for my 3 machines, a P4, a PIII, and a PII (and a partridge in a pear trg). According to the gcc docs at... http://gcc.gnu.org/onlinedocs/gcc-3.3.5/gcc/i386-and-x86_002d64-Options.html#i386-and-x86_002d64-Options i586 is equivalent to pentium and i686 is equivalent to pentiumpro. Does this mean that I would get better optimization if I use pentium2, pentium3 or pentium4, as appropriate? I am using the available flags (-mmmx, -msse, -msse2, -mfpmath=sse, etc) as appropriate. Yes, it would. My CFLAGS (Pentium II, 504 MHz, 224 MB RAM): -O3 -march=pentium2 -mmmx -fomit-frame-pointer -pipe -ftracer -fno-rename-registers -funroll-loops I have a stable install of kernel 2.6.11-gentoo-r9 and GNOME 2.8.3. I haven't had a kernel panic yet, and I compiled and run the system with an overclocked 112 MHz front side bus. It was worth sitting around watching endless lines of text scroll by. My secrets? -O3: The highest performance optimization level before code starts to break. It goes up to -O9 if you're daring. (Use -Os to compile for size.) Implies a lot of stuff. -march=pentium2: Implies -mmmx and writes code specifically for the P2 processor. -mmmx: Build code with MMX instructions wherever possible. -fomit-frame-pointer: Don't keep the frame pointer in a register. You get an extra register at the cost of losing debugging ability. -pipe: Use pipes instead of temporary files. Not recommended on a RAM-limited system. -ftracer: Use the processor's branch predictor when compiling. I think it compiles twice with this flag, but it does compile more efficiently. -fno-rename-registers: Renaming registers is only done when running 32-bit code on a 64-bit processor. It's implied on x86 architecture anyway. -funroll-loops: If you can tell how many times a loop will loop (mainly for loops), then unroll it. Does it increase performance? If it does, it's unnoticeable. Don't tell anyone you use it though. It spreads the whole Gentoo ricer myth that's been going around the Internet. If your Pentium 4 supports Hyper-Threading, adjust MAKEOPTS accordingly. My P4 compiles faster at -j3 than -j2. (Haven't tried -j4 though.) -- Colin -- gentoo-user@gentoo.org mailing list
Re: [gentoo-user] CFLAGS CPU optimization question.
Colin wrote: -funroll-loops: If you can tell how many times a loop will loop (mainly for loops), then unroll it. Does it increase performance? If it does, it's unnoticeable. Don't tell anyone you use it though. It spreads the whole Gentoo ricer myth that's been going around the Internet. Just a quick word of warning...-O3 can be slower or faster than -O2 or -Os depending upon what code you are running. The same is true of -funroll-loops...it can actually hurt performance in many cases. This is due to the effect of a cache miss causing the processor to fetch data from RAM, which takes a dozen or more clock cycles on a modern x86 computer. Those running CPUs that have big disparities between the internal clock and the memory bus are well advised to test the effects of these flags on their own systems. For me, compression and encryption (when I make backups) are my big CPU hogs. So those are what I tested, and I found -Os to be about 5% faster on average than -O2, depending upon whether it was gzip, bzip2, and what level (-1 thru -9) of compression I chose. There were some cases that were about 5% slower, but not ones I am likely to use. -O3 was either 5% faster or 20% slower than -O2. Compilation time was about 10% faster with -Os compared to -O2, and I don't really remember how much more time -O3 took. That is on a P4 3Ghz with HT. The encryption code didn't show any performance boost or hit with any of the optimization levels, probably because it includes p4-optimized assembly code. If your Pentium 4 supports Hyper-Threading, adjust MAKEOPTS accordingly. My P4 compiles faster at -j3 than -j2. (Haven't tried -j4 though.) Another word or warning...beware of how much memory compilation takes. Large C++ packages (like X11 and KDE) can require over one hundred megabytes *per module* for the compilation at -O2. -O3 will require even more memory. I'd recommend only using -j2 if you have at least 512MB of memory, and -j3 at 1GB or more. -Richard -- gentoo-user@gentoo.org mailing list
Re: [gentoo-user] CFLAGS CPU optimization question.
On 5/24/05, Richard Fish [EMAIL PROTECTED] wrote: [ recommandations about performance cflags ] While we're at optimizing stuff, here are my CFLAGS (athlon-xp mobile, barton core): CFLAGS=-O2 -march=athlon-xp -msse -mfpmath=sse -pipe -finline-functions -fsched2-use-superblocks -fsched2-use-traces -fmove-all-movables -frename-registers -fweb -ffast-math -funsafe-math-optimizations -fprefetch-loop-arrays -fforce-addr -momit-leaf-frame-pointer -ftracer -funit-at-a-time -maccumulate-outgoing-args 2.6.10-gentoo-r6 kernel, everything is stable... and far faster than when I had only -O2 -march=athlon-xp -pipe :-) Julien. -- gentoo-user@gentoo.org mailing list