https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81818

--- Comment #5 from Andrew Roberts <andrewm.roberts at sky dot com> ---
Ok, I've done some more digging. 

Looking at the optimization options enabled by -O2 vs -O1, I built the test
program at -O1 and enabled each optimization in turn, on both ARM and AARCH64.

It looks like -fgcse is using the most memory of all the optimizations.
On ARM "-O1 -fgcse" is using MORE memory than "-O2". 

This suggests to me that on ARM the gcse optimization is not being run for -O2
due to some cost benefit analysis or something. Where as it is on AARCH64. Is
there anyway to get some info out of gcc to prove this?

On AARCH64 -fgcse results in a huge compile time increase due to the additional
memory usage causing massive swapping. ARM compile time increased by 14%, but
AARCH compile time increased by 400%. When there is enough RAM to avoid
swapping  -fgcse looks ok (2Gb on odroid-c2).

Tested using: gcc version 8.0.0 20170806 (experimental) (GCC) on
Raspberry PI 3 1Gb RAM (both armv7l and aarch64).

For ARM:

Optimization Level: -O1 -falign-functions
Time=1:20.76 Mem=320040 PageFaults=0
Optimization Level: -O1 -falign-jumps
Time=1:21.10 Mem=319940 PageFaults=0
Optimization Level: -O1 -falign-labels
Time=1:21.00 Mem=320028 PageFaults=0
Optimization Level: -O1 -falign-loops
Time=1:20.62 Mem=320028 PageFaults=0
Optimization Level: -O1 -fcaller-saves
Time=1:20.45 Mem=319884 PageFaults=0
Optimization Level: -O1 -fcode-hoisting
Time=1:22.01 Mem=320832 PageFaults=0
Optimization Level: -O1 -fcrossjumping
Time=1:21.28 Mem=320164 PageFaults=0
Optimization Level: -O1 -fcse-follow-jumps
Time=1:20.47 Mem=320000 PageFaults=0
Optimization Level: -O1 -fdevirtualize
Time=1:42.07 Mem=320032 PageFaults=0
Optimization Level: -O1 -fdevirtualize-speculatively
Time=1:20.44 Mem=320008 PageFaults=0
Optimization Level: -O1 -fexpensive-optimizations
Time=1:22.92 Mem=321752 PageFaults=0
Optimization Level: -O1 -fgcse
Time=1:34.12 Mem=556640 PageFaults=0                     <================
Optimization Level: -O1 -fhoist-adjacent-loads
Time=1:20.45 Mem=319940 PageFaults=0
Optimization Level: -O1 -findirect-inlining
Time=1:21.31 Mem=320020 PageFaults=0
Optimization Level: -O1 -finline-small-functions
Time=1:32.36 Mem=319992 PageFaults=0
Optimization Level: -O1 -fipa-bit-cp
Time=1:21.13 Mem=320008 PageFaults=0
Optimization Level: -O1 -fipa-cp
Time=1:19.94 Mem=322140 PageFaults=0
Optimization Level: -O1 -fipa-icf
Time=1:21.50 Mem=319940 PageFaults=0
Optimization Level: -O1 -fipa-icf-functions
Time=1:20.93 Mem=320060 PageFaults=0
Optimization Level: -O1 -fipa-icf-variables
Time=1:20.48 Mem=320044 PageFaults=0
Optimization Level: -O1 -fipa-ra
Time=1:20.58 Mem=320284 PageFaults=0
Optimization Level: -O1 -fipa-sra
Time=1:12.69 Mem=310648 PageFaults=0
Optimization Level: -O1 -fipa-vrp
Time=1:20.45 Mem=319836 PageFaults=0
Optimization Level: -O1 -fisolate-erroneous-paths-dereference
Time=1:20.61 Mem=320024 PageFaults=0
Optimization Level: -O1 -flra-remat
Time=1:20.56 Mem=319944 PageFaults=0
Optimization Level: -O1 -foptimize-sibling-calls
Time=1:20.69 Mem=320012 PageFaults=0
Optimization Level: -O1 -foptimize-strlen
Time=1:21.10 Mem=320024 PageFaults=0
Optimization Level: -O1 -fpartial-inlining
Time=1:21.19 Mem=319888 PageFaults=0
Optimization Level: -O1 -fpeephole2
Time=1:20.75 Mem=319888 PageFaults=0
Optimization Level: -O1 -freorder-functions
Time=1:20.63 Mem=319884 PageFaults=0
Optimization Level: -O1 -frerun-cse-after-loop
Time=1:21.96 Mem=320984 PageFaults=0
Optimization Level: -O1 -fschedule-insns2
Time=1:24.68 Mem=343916 PageFaults=0
Optimization Level: -O1 -fschedule-insns
Time=1:52.77 Mem=324696 PageFaults=0
Optimization Level: -O1 -fstore-merging
Time=1:20.47 Mem=320208 PageFaults=0
Optimization Level: -O1 -fstrict-aliasing
Time=1:20.86 Mem=319880 PageFaults=0
Optimization Level: -O1 -fthread-jumps
Time=1:20.31 Mem=319900 PageFaults=0
Optimization Level: -O1 -ftree-pre
Time=1:21.38 Mem=320696 PageFaults=0
Optimization Level: -O1 -ftree-switch-conversion
Time=1:20.51 Mem=320004 PageFaults=0
Optimization Level: -O1 -ftree-tail-merge
Time=1:21.13 Mem=320040 PageFaults=0
Optimization Level: -O1 -ftree-vrp
Time=1:21.01 Mem=323032 PageFaults=0

For AARCH64:

Optimization Level: -O1 -falign-functions
Time=2:22.49 Mem=393844 PageFaults=150
Optimization Level: -O1 -falign-jumps
Time=2:20.70 Mem=393952 PageFaults=0
Optimization Level: -O1 -falign-labels
Time=2:21.09 Mem=393880 PageFaults=0
Optimization Level: -O1 -falign-loops
Time=2:20.68 Mem=393956 PageFaults=0
Optimization Level: -O1 -fcaller-saves
Time=2:20.98 Mem=393968 PageFaults=0
Optimization Level: -O1 -fcode-hoisting
Time=2:22.60 Mem=395656 PageFaults=0
Optimization Level: -O1 -fcrossjumping
Time=2:21.69 Mem=393956 PageFaults=0
Optimization Level: -O1 -fcse-follow-jumps
Time=2:21.12 Mem=393968 PageFaults=0
Optimization Level: -O1 -fdevirtualize
Time=2:58.68 Mem=393412 PageFaults=0
Optimization Level: -O1 -fdevirtualize-speculatively
Time=2:20.83 Mem=393968 PageFaults=0
Optimization Level: -O1 -fexpensive-optimizations
Time=2:23.44 Mem=394100 PageFaults=0
Optimization Level: -O1 -fgcse
Time=9:41.73 Mem=784488 PageFaults=9534                 <====================
Optimization Level: -O1 -fhoist-adjacent-loads
Time=2:21.55 Mem=393716 PageFaults=14
Optimization Level: -O1 -findirect-inlining
Time=2:29.22 Mem=393856 PageFaults=0
Optimization Level: -O1 -finline-small-functions
Time=2:45.25 Mem=393252 PageFaults=0
Optimization Level: -O1 -fipa-bit-cp
Time=2:21.23 Mem=393800 PageFaults=0
Optimization Level: -O1 -fipa-cp
Time=2:20.63 Mem=390500 PageFaults=0
Optimization Level: -O1 -fipa-icf
Time=2:21.53 Mem=393796 PageFaults=0
Optimization Level: -O1 -fipa-icf-functions
Time=2:21.97 Mem=393892 PageFaults=0
Optimization Level: -O1 -fipa-icf-variables
Time=2:23.48 Mem=393804 PageFaults=0
Optimization Level: -O1 -fipa-ra
Time=2:21.18 Mem=394920 PageFaults=0
Optimization Level: -O1 -fipa-sra
Time=2:07.81 Mem=378276 PageFaults=0
Optimization Level: -O1 -fipa-vrp
Time=2:21.15 Mem=393812 PageFaults=0
Optimization Level: -O1 -fisolate-erroneous-paths-dereference
Time=2:21.33 Mem=393812 PageFaults=0
Optimization Level: -O1 -flra-remat
Time=2:20.67 Mem=393800 PageFaults=0
Optimization Level: -O1 -foptimize-sibling-calls
Time=2:21.33 Mem=393704 PageFaults=0
Optimization Level: -O1 -foptimize-strlen
Time=2:29.38 Mem=393716 PageFaults=0
Optimization Level: -O1 -fpartial-inlining
Time=2:20.68 Mem=393868 PageFaults=0
Optimization Level: -O1 -fpeephole2
Time=2:26.58 Mem=409308 PageFaults=4
Optimization Level: -O1 -freorder-functions
Time=2:21.28 Mem=393824 PageFaults=0
Optimization Level: -O1 -frerun-cse-after-loop
Time=2:22.67 Mem=395444 PageFaults=0
Optimization Level: -O1 -fschedule-insns2
Time=2:26.74 Mem=409168 PageFaults=0
Optimization Level: -O1 -fschedule-insns
Time=3:29.72 Mem=395320 PageFaults=0
Optimization Level: -O1 -fstore-merging
Time=2:20.61 Mem=393968 PageFaults=0
Optimization Level: -O1 -fstrict-aliasing
Time=2:21.11 Mem=393860 PageFaults=0
Optimization Level: -O1 -fthread-jumps
Time=2:20.56 Mem=393784 PageFaults=0
Optimization Level: -O1 -ftree-pre
Time=2:23.19 Mem=395344 PageFaults=0
Optimization Level: -O1 -ftree-switch-conversion
Time=2:29.38 Mem=393728 PageFaults=0
Optimization Level: -O1 -ftree-tail-merge
Time=2:20.86 Mem=393836 PageFaults=0
Optimization Level: -O1 -ftree-vrp
Time=2:22.31 Mem=389336 PageFaults=0

Comparing -O1, -O1 -fgcse and -O2

ARM (1Gb RAM):

gcc -O1 -c testmap.cpp
Time=2:25.26 Mem=318976 PageFaults=0

gcc -O1 -fgcse -c testmap.cpp
Time=2:42.04 Mem=554872 PageFaults=123

gcc -O2 -c testmap.cpp
Time=2:31.89 Mem=262828 PageFaults=11   <== Is this really running all -O2
opts?

AARCH64 (1Gb RAM):

gcc -O1 -c testmap.cpp
Time=3:10.68 Mem=397248 PageFaults=3

gcc -O1 -fgcse -c testmap.cpp
Time=5:15.26 Mem=771276 PageFaults=166

gcc -O2 -c testmap.cpp
Time=11:06.06 Mem=820744 PageFaults=4266


AARCH64 (2Gb RAM, odroid-c2, different clock speed etc):

gcc -O1 -c ../testmap.cpp
Time=1:47.58 Mem=394896 PageFaults=0

gcc -O1 -fgcse -c ../testmap.cpp
Time=3:10.06 Mem=765460 PageFaults=0

gcc -O2 -c ../testmap.cpp
Time=3:05.06 Mem=906624 PageFaults=0

Reply via email to