Changeset: e6fec00afc94 for MonetDB
URL: http://dev.monetdb.org/hg/MonetDB?cmd=changeset;node=e6fec00afc94
Modified Files:
        configure.ag
Branch: default
Log Message:

configure: experimental cleanup of GCC optmisation flags

Cleanup GCC-based optimisation flags.  Use -O3 instead of our -O6
-fexpensive ..., since -O3 already includes all of those flags, except
-funroll-all-loops.  GCC knows best about the target platform it
compiles for, hence it also knows the best argument for -falign-loops
(malign-loops doesn't exist, so it wasn't in effect anyway).

On Darwin, use the special -fast flag (GCC 4.6's equivalent -Ofast???)
to enable ultimate optimisation flags, though they might break the
resulting binary.

This commit is experimental in the sense that it is meant to be compared
to the last run, such that we can see differences in
- compilation speed
- running speed
- test output


diffs (123 lines):

diff --git a/configure.ag b/configure.ag
--- a/configure.ag
+++ b/configure.ag
@@ -990,67 +990,58 @@
     yes-*)
       dnl -fomit-frame-pointer crashes memprof
       case "$host-$gcc_ver" in
-      x86_64-*-*-3.[[2-9]]*|i*86-*-*-3.[[2-9]]*|x86_64-*-*-4.*|i*86-*-*-4.*)
-                      CFLAGS="$CFLAGS -O6"
-                      case "$host" in
-                      i*86-*-cygwin) 
-                           dnl  With gcc 3.2, the combination of "-On 
-fomit-frame-pointer" (n>1)
-                           dnl  does not seem to produce stable/correct? 
binaries under CYGWIN
-                           dnl  (Mdiff and Mserver crash with segmentation 
faults);
-                           dnl  hence, we omit -fomit-frame-pointer, here.
-                           ;;
-                      *)   CFLAGS="$CFLAGS -fomit-frame-pointer";;
-                      esac
-                      CFLAGS="$CFLAGS                          
-finline-functions -falign-loops=4 -falign-jumps=4 -falign-functions=4 
-fexpensive-optimizations                     -funroll-loops 
-frerun-cse-after-loop -frerun-loop-opt"
-                      dnl  With gcc 3.2, the combination of "-On 
-funroll-all-loops" (n>1)
-                      dnl  does not seem to produce stable/correct? binaries
-                      dnl  (Mserver produces tons of incorrect BATpropcheck 
warnings);
-                      dnl  hence, we omit -funroll-all-loops, here.
-                      case "$gcc_ver" in
-                      4.*) CFLAGS="$CFLAGS -ftree-vectorize";;
-                           dnl  "-ftree-vectorize" is only available with 
newer versions of gcc, only;
-                           dnl  did not check the exact version, but 4.1 has 
it, while 3.4.5 does not.
-                      esac
-                      ;;
-      x86_64-*-*|i*86-*-*)
-                      CFLAGS="$CFLAGS -O6 -fomit-frame-pointer 
-finline-functions -malign-loops=4 -malign-jumps=4 -malign-functions=4 
-fexpensive-optimizations -funroll-all-loops  -funroll-loops 
-frerun-cse-after-loop -frerun-loop-opt"
-                      case "$gcc_ver" in
-                      4.*) CFLAGS="$CFLAGS -ftree-vectorize";;
-                           dnl  "-ftree-vectorize" is only available with 
newer versions of gcc, only;
-                           dnl  did not check the exact version, but 4.1 has 
it, while 3.4.5 does not.
-                      esac
-                      ;;
-      ia64-*-*)       CFLAGS="$CFLAGS -O6 -fomit-frame-pointer 
-finline-functions                                                     
-fexpensive-optimizations                                    
-frerun-cse-after-loop -frerun-loop-opt"
-                      dnl  Obviously, 4-byte alignment doesn't make sense on 
Linux64; didn't try 8-byte alignment, yet.
-                      dnl  Further, when combining either of 
"-funroll-all-loops" and "-funroll-loops" with "-On" (n>1),
-                      dnl  gcc (3.2.1 & 2.96) does not seem to produce 
stable/correct? binaries under Linux64
-                      dnl  (Mserver crashes with segmentation fault);
-                      dnl  hence, we omit both "-funroll-all-loops" and 
"-funroll-loops", here
-                      case "$gcc_ver" in
-                      4.*) CFLAGS="$CFLAGS -ftree-vectorize";;
-                           dnl  "-ftree-vectorize" is only available with 
newer versions of gcc, only;
-                           dnl  did not check the exact version, but 4.1 has 
it, while 3.4.5 does not.
-                      esac
-                      ;;
-      *-sun-solaris*)
-                      if test "$bits" = "64" ; then
-                        NO_INLINE_CFLAGS="$NO_INLINE_CFLAGS -O1"
-                      fi
-                      case "$gcc_ver" in
-                      4.*)
-                                                 CFLAGS="$CFLAGS -O6 
-fomit-frame-pointer -finline-functions -fexpensive-optimizations 
-funroll-all-loops -funroll-loops -frerun-cse-after-loop -frerun-loop-opt 
-ftree-vectorize";;
-                                         *)
-                                                 CFLAGS="$CFLAGS -O2 
-fomit-frame-pointer -finline-functions";;
-                      esac
-                      ;;
-      *irix*)         CFLAGS="$CFLAGS -O6 -fomit-frame-pointer 
-finline-functions"
-                      ;;
-      *aix*)          CFLAGS="$CFLAGS -O6 -fomit-frame-pointer 
-finline-functions"
-                      if test "$bits" = "64" ; then
-                        NO_INLINE_CFLAGS="$NO_INLINE_CFLAGS -O0"
-                      fi
-                      ;;
-      *)              CFLAGS="$CFLAGS -O6 -fomit-frame-pointer 
-finline-functions";;
+               powerpc*-apple-darwin*)
+                 # -fast switch includes -mdynamic-no-pic, unless -fPIC is
+                 # given, which we need for dynamic libraries, flags:
+                 #  -O3 -falign-loops-max-skip=15 -falign-jumps-max-skip=15
+                 #  -falign-loops=16 -falign-jumps=16 -falign-functions=16
+                 #  -malign-natural -ffast-math -fstrict-aliasing
+                 #  -funroll-loops -ftree-loop-linear -ftree-loop-memset
+                 #  -mcpu=G5 -mpowerpc-gpopt -mtune=G5 -fsched-interblock
+                 #  -fgcse-sm -mpowerpc64
+                 CFLAGS="-fast -fPIC -pipe ${CFLAGS}"
+               ;;
+               i?86-apple-darwin*|x86_64-apple-darwin*)
+                 # -fast switch on Intel is a lot less tuned:
+                 #  -O3 -fomit-frame-pointer -fstrict-aliasing
+                 #  -momit-leaf-frame-pointer -fno-tree-pre -falign-loops
+                 CFLAGS="-fast -pipe ${CFLAGS}"
+               ;;
+        *)
+                 # -O1 on gcc enables all slight optimisations:
+                 #   -fauto-inc-dec -fcprop-registers -fdce -fdefer-pop
+                 #   -fdelayed-branch -fdse -fguess-branch-probability
+                 #   -fif-conversion2 -fif-conversion -fipa-pure-const
+                 #   -fipa-reference -fmerge-constants -fsplit-wide-types
+                 #   -ftree-builtin-call-dce -ftree-ccp -ftree-ch
+                 #   -ftree-copyrename -ftree-dce -ftree-dominator-opts
+                 #   -ftree-dse -ftree-forwprop -ftree-fre -ftree-phiprop
+                 #   -ftree-sra -ftree-pta -ftree-ter -funit-at-a-time 
+                 # on top of this -fomit-frame-pointer is enabled on machines
+                 # where this does not interfere with debugging.
+                 # -O2 on gcc enables optimisations which do not involve a
+                 # speed-space tradeoff on top of -O1:
+                 #   -fthread-jumps -falign-functions  -falign-jumps
+                 #   -falign-loops -falign-labels -fcaller-saves -fcrossjumping
+                 #   -fcse-follow-jumps  -fcse-skip-blocks
+                 #   -fdelete-null-pointer-checks -fexpensive-optimizations
+                 #   -fgcse -fgcse-lm -finline-small-functions
+                 #   -findirect-inlining -fipa-sra -foptimize-sibling-calls
+                 #   -fpeephole2 -fregmove -freorder-blocks -freorder-functions
+                 #   -frerun-cse-after-loop -fsched-interblock -fsched-spec
+                 #   -fschedule-insns -fschedule-insns2 -fstrict-aliasing
+                 #   -fstrict-overflow -ftree-switch-conversion -ftree-pre
+                 #   -ftree-vrp
+                 # (Gentoo enables -D_FORTIFY_SOURCE=2 starting at -O2)
+                 # -O3 on gcc enables some more expensive optimisations on top
+                 # of -O2:
+                 #  -finline-functions, -funswitch-loops,
+                 #  -fpredictive-commoning, -fgcse-after-reload,
+                 #  -ftree-vectorize and -fipa-cp-clone 
+                 CFLAGS="-O3 -pipe ${CFLAGS}"
+                 # the following flag used to be applied, but is discouraged by
+                 # GCC manpage: -funroll-all-loops
+               ;;
       esac
       ;;
     *)
_______________________________________________
Checkin-list mailing list
[email protected]
http://mail.monetdb.org/mailman/listinfo/checkin-list

Reply via email to