Changeset: e6fec00afc94 for MonetDB
URL: http://dev.monetdb.org/hg/MonetDB?cmd=changeset;node=e6fec00afc94
Modified Files:
configure.ag
Branch: default
Log Message:
configure: experimental cleanup of GCC optmisation flags
Cleanup GCC-based optimisation flags. Use -O3 instead of our -O6
-fexpensive ..., since -O3 already includes all of those flags, except
-funroll-all-loops. GCC knows best about the target platform it
compiles for, hence it also knows the best argument for -falign-loops
(malign-loops doesn't exist, so it wasn't in effect anyway).
On Darwin, use the special -fast flag (GCC 4.6's equivalent -Ofast???)
to enable ultimate optimisation flags, though they might break the
resulting binary.
This commit is experimental in the sense that it is meant to be compared
to the last run, such that we can see differences in
- compilation speed
- running speed
- test output
diffs (123 lines):
diff --git a/configure.ag b/configure.ag
--- a/configure.ag
+++ b/configure.ag
@@ -990,67 +990,58 @@
yes-*)
dnl -fomit-frame-pointer crashes memprof
case "$host-$gcc_ver" in
- x86_64-*-*-3.[[2-9]]*|i*86-*-*-3.[[2-9]]*|x86_64-*-*-4.*|i*86-*-*-4.*)
- CFLAGS="$CFLAGS -O6"
- case "$host" in
- i*86-*-cygwin)
- dnl With gcc 3.2, the combination of "-On
-fomit-frame-pointer" (n>1)
- dnl does not seem to produce stable/correct?
binaries under CYGWIN
- dnl (Mdiff and Mserver crash with segmentation
faults);
- dnl hence, we omit -fomit-frame-pointer, here.
- ;;
- *) CFLAGS="$CFLAGS -fomit-frame-pointer";;
- esac
- CFLAGS="$CFLAGS
-finline-functions -falign-loops=4 -falign-jumps=4 -falign-functions=4
-fexpensive-optimizations -funroll-loops
-frerun-cse-after-loop -frerun-loop-opt"
- dnl With gcc 3.2, the combination of "-On
-funroll-all-loops" (n>1)
- dnl does not seem to produce stable/correct? binaries
- dnl (Mserver produces tons of incorrect BATpropcheck
warnings);
- dnl hence, we omit -funroll-all-loops, here.
- case "$gcc_ver" in
- 4.*) CFLAGS="$CFLAGS -ftree-vectorize";;
- dnl "-ftree-vectorize" is only available with
newer versions of gcc, only;
- dnl did not check the exact version, but 4.1 has
it, while 3.4.5 does not.
- esac
- ;;
- x86_64-*-*|i*86-*-*)
- CFLAGS="$CFLAGS -O6 -fomit-frame-pointer
-finline-functions -malign-loops=4 -malign-jumps=4 -malign-functions=4
-fexpensive-optimizations -funroll-all-loops -funroll-loops
-frerun-cse-after-loop -frerun-loop-opt"
- case "$gcc_ver" in
- 4.*) CFLAGS="$CFLAGS -ftree-vectorize";;
- dnl "-ftree-vectorize" is only available with
newer versions of gcc, only;
- dnl did not check the exact version, but 4.1 has
it, while 3.4.5 does not.
- esac
- ;;
- ia64-*-*) CFLAGS="$CFLAGS -O6 -fomit-frame-pointer
-finline-functions
-fexpensive-optimizations
-frerun-cse-after-loop -frerun-loop-opt"
- dnl Obviously, 4-byte alignment doesn't make sense on
Linux64; didn't try 8-byte alignment, yet.
- dnl Further, when combining either of
"-funroll-all-loops" and "-funroll-loops" with "-On" (n>1),
- dnl gcc (3.2.1 & 2.96) does not seem to produce
stable/correct? binaries under Linux64
- dnl (Mserver crashes with segmentation fault);
- dnl hence, we omit both "-funroll-all-loops" and
"-funroll-loops", here
- case "$gcc_ver" in
- 4.*) CFLAGS="$CFLAGS -ftree-vectorize";;
- dnl "-ftree-vectorize" is only available with
newer versions of gcc, only;
- dnl did not check the exact version, but 4.1 has
it, while 3.4.5 does not.
- esac
- ;;
- *-sun-solaris*)
- if test "$bits" = "64" ; then
- NO_INLINE_CFLAGS="$NO_INLINE_CFLAGS -O1"
- fi
- case "$gcc_ver" in
- 4.*)
- CFLAGS="$CFLAGS -O6
-fomit-frame-pointer -finline-functions -fexpensive-optimizations
-funroll-all-loops -funroll-loops -frerun-cse-after-loop -frerun-loop-opt
-ftree-vectorize";;
- *)
- CFLAGS="$CFLAGS -O2
-fomit-frame-pointer -finline-functions";;
- esac
- ;;
- *irix*) CFLAGS="$CFLAGS -O6 -fomit-frame-pointer
-finline-functions"
- ;;
- *aix*) CFLAGS="$CFLAGS -O6 -fomit-frame-pointer
-finline-functions"
- if test "$bits" = "64" ; then
- NO_INLINE_CFLAGS="$NO_INLINE_CFLAGS -O0"
- fi
- ;;
- *) CFLAGS="$CFLAGS -O6 -fomit-frame-pointer
-finline-functions";;
+ powerpc*-apple-darwin*)
+ # -fast switch includes -mdynamic-no-pic, unless -fPIC is
+ # given, which we need for dynamic libraries, flags:
+ # -O3 -falign-loops-max-skip=15 -falign-jumps-max-skip=15
+ # -falign-loops=16 -falign-jumps=16 -falign-functions=16
+ # -malign-natural -ffast-math -fstrict-aliasing
+ # -funroll-loops -ftree-loop-linear -ftree-loop-memset
+ # -mcpu=G5 -mpowerpc-gpopt -mtune=G5 -fsched-interblock
+ # -fgcse-sm -mpowerpc64
+ CFLAGS="-fast -fPIC -pipe ${CFLAGS}"
+ ;;
+ i?86-apple-darwin*|x86_64-apple-darwin*)
+ # -fast switch on Intel is a lot less tuned:
+ # -O3 -fomit-frame-pointer -fstrict-aliasing
+ # -momit-leaf-frame-pointer -fno-tree-pre -falign-loops
+ CFLAGS="-fast -pipe ${CFLAGS}"
+ ;;
+ *)
+ # -O1 on gcc enables all slight optimisations:
+ # -fauto-inc-dec -fcprop-registers -fdce -fdefer-pop
+ # -fdelayed-branch -fdse -fguess-branch-probability
+ # -fif-conversion2 -fif-conversion -fipa-pure-const
+ # -fipa-reference -fmerge-constants -fsplit-wide-types
+ # -ftree-builtin-call-dce -ftree-ccp -ftree-ch
+ # -ftree-copyrename -ftree-dce -ftree-dominator-opts
+ # -ftree-dse -ftree-forwprop -ftree-fre -ftree-phiprop
+ # -ftree-sra -ftree-pta -ftree-ter -funit-at-a-time
+ # on top of this -fomit-frame-pointer is enabled on machines
+ # where this does not interfere with debugging.
+ # -O2 on gcc enables optimisations which do not involve a
+ # speed-space tradeoff on top of -O1:
+ # -fthread-jumps -falign-functions -falign-jumps
+ # -falign-loops -falign-labels -fcaller-saves -fcrossjumping
+ # -fcse-follow-jumps -fcse-skip-blocks
+ # -fdelete-null-pointer-checks -fexpensive-optimizations
+ # -fgcse -fgcse-lm -finline-small-functions
+ # -findirect-inlining -fipa-sra -foptimize-sibling-calls
+ # -fpeephole2 -fregmove -freorder-blocks -freorder-functions
+ # -frerun-cse-after-loop -fsched-interblock -fsched-spec
+ # -fschedule-insns -fschedule-insns2 -fstrict-aliasing
+ # -fstrict-overflow -ftree-switch-conversion -ftree-pre
+ # -ftree-vrp
+ # (Gentoo enables -D_FORTIFY_SOURCE=2 starting at -O2)
+ # -O3 on gcc enables some more expensive optimisations on top
+ # of -O2:
+ # -finline-functions, -funswitch-loops,
+ # -fpredictive-commoning, -fgcse-after-reload,
+ # -ftree-vectorize and -fipa-cp-clone
+ CFLAGS="-O3 -pipe ${CFLAGS}"
+ # the following flag used to be applied, but is discouraged by
+ # GCC manpage: -funroll-all-loops
+ ;;
esac
;;
*)
_______________________________________________
Checkin-list mailing list
[email protected]
http://mail.monetdb.org/mailman/listinfo/checkin-list