[rewritten/remeasured as per suggestion by Andy Kleen]
Hello,
I've tried to measure some cache misses of 4.0.1 and 4.1.0 C++
compilers by using oprofile on amd64 box while compiling MICO sources
and found that:
0) compiler options used were:
-I../include -Wall -D_REENTRANT -D_GNU_SOURCE -DPIC -fPIC -c
1) the most expensive seems to be comptypes -- at least from data L2
refill point of view (~17%)
2) comptypes is also the most CPU intensive operation since the most
of time is spent there
3) some other data L2 refill expensive functions seems to be:
push_to_top_level(~6%), compparms(~4%),
htab_find_slot_with_hash(~3%), ggc_alloc_stat(~3%)
4) for 4.0.1 every data L2 refill happens every 774 (CPU_CLK_UNHALTED
* 100 / DATA_CACHE_REFILLS_FROM_SYSTEM) CLK event
5) for 4.1.0 every data L2 refill happens every 765 CLK event
6) 4.1.0 is a _bit_ faster than 4.0.1
7) tables were produced after three cycles of "make; find . -name '*.o'
-exec rm \{} \;"
I don't know if ICACHE_MISSES is that important since I think it
measures L1 I cache misses instead of L2. If I'm not right please
correct me.
First few lines of produced tables are below. One table is for overall
cc1plus run and one is for symbol listing.
Please let me know if you find something like that useful so I will
continue from time to time to provide such data or if it is completely
useless and I will try to help somewhere else.
Thanks!
Karel
GCC 4.0.1 20050514 (prerelease):
silence:~$
~/usr/local/gcc-4_0-branch-20050514-mt-allocator-amd64-linux-gnu/bin/c++ -v
Using built-in specs.
Target: amd64-linux-gnu
Configured with: ../gcc-4_0-branch/configure
--prefix=/home/karel/usr/local/gcc-4_0-branch-20050514-mt-allocator-amd64-linux-gnu
--enable-shared --enable-threads --enable-languages=c++ --disable-checking
--enable-__cxa_atexit --disable-multilib --enable-libstdcxx-allocator=mt
amd64-linux-gnu
Thread model: posix
CPU: AMD64 processors, speed 1802.33 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask
of 0x00 (No unit mask) count 100000
Counted ICACHE_MISSES events (Instruction cache misses) with a unit mask of
0x00 (No unit mask) count 1000
Counted DATA_CACHE_REFILLS_FROM_SYSTEM events (Data cache refills from system)
with a unit mask of 0x1f (All cache states) count 1000
CPU_CLK_UNHALT...|ICACHE_MISSES:...|DATA_CACHE_REF...|
samples| %| samples| %| samples| %|
------------------------------------------------------
5937586 100.000 4068766 100.000 767082 100.000 cc1plus
CPU: AMD64 processors, speed 1802.33 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask
of 0x00 (No unit mask) count 100000
Counted ICACHE_MISSES events (Instruction cache misses) with a unit mask of
0x00 (No unit mask) count 1000
Counted DATA_CACHE_REFILLS_FROM_SYSTEM events (Data cache refills from system)
with a unit mask of 0x1f (All cache states
) count 1000
samples % samples % samples % symbol name
282129 4.7516 84062 2.0660 133054 17.3455 comptypes
222187 3.7420 35072 0.8620 14406 1.8780 lookup_fnfields_1
189661 3.1942 99075 2.4350 22870 2.9814 ggc_alloc_stat
163945 2.7611 10867 0.2671 1238 0.1614 dfs_walk_all
129072 2.1738 6189 0.1521 1649 0.2150 record_reg_classes
115945 1.9527 11575 0.2845 6508 0.8484 walk_tree
104466 1.7594 34266 0.8422 1044 0.1361 find_reloads
78529 1.3226 11466 0.2818 4045 0.5273 splay_tree_splay_helper
71485 1.2039 1881 0.0462 1164 0.1517 _cpp_lex_direct
66814 1.1253 52100 1.2805 23340 3.0427 htab_find_slot_with_hash
66042 1.1123 16046 0.3944 5365 0.6994 lookup_field_1
64969 1.0942 16433 0.4039 19151 2.4966 ht_lookup_with_hash
63059 1.0620 29488 0.7247 20545 2.6783 tsubst
60314 1.0158 124283 3.0546 1902 0.2480 grokdeclarator
59543 1.0028 5354 0.1316 3547 0.4624 cp_walk_subtrees
58087 0.9783 518 0.0127 398 0.0519 _cpp_clean_line
57753 0.9727 372 0.0091 63 0.0082
dfs_find_final_overrider_pre
50981 0.8586 3283 0.0807 47105 6.1408 push_to_top_level
GCC 4.1.0 20050514 (experimental):
silence:~$ ~/usr/local/gcc-main-20050514/bin/c++ -v
Using built-in specs.
Target: amd64-unknown-linux-gnu
Configured with: ../gcc-main/configure
--prefix=/home/karel/usr/local/gcc-main-20050514 --enable-shared
--enable-threads --enable-languages=c++ --disable-checking
--enable-__cxa_atexit --disable-multilib amd64-unknown-linux-gnu
Thread model: posix
gcc version 4.1.0 20050514 (experimental)
CPU: AMD64 processors, speed 1802.33 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask
of 0x00 (No unit mask) count 100000
Counted ICACHE_MISSES events (Instruction cache misses) with a unit mask of
0x00 (No unit mask) count 1000
Counted DATA_CACHE_REFILLS_FROM_SYSTEM events (Data cache refills from system)
with a unit mask of 0x1f (All cache states) count 1000
CPU_CLK_UNHALT...|ICACHE_MISSES:...|DATA_CACHE_REF...|
samples| %| samples| %| samples| %|
------------------------------------------------------
5892854 100.000 3907118 100.000 769938 100.000 cc1plus
CPU: AMD64 processors, speed 1802.33 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask
of 0x00 (No unit mask) count 100000
Counted ICACHE_MISSES events (Instruction cache misses) with a unit mask of
0x00 (No unit mask) count 1000
Counted DATA_CACHE_REFILLS_FROM_SYSTEM events (Data cache refills from system)
with a unit mask of 0x1f (All cache states
) count 1000
samples % samples % samples % symbol name
264029 4.4805 61866 1.5834 119923 15.5757 comptypes
209962 3.5630 35886 0.9185 15013 1.9499 lookup_fnfields_1
204992 3.4787 87966 2.2514 23110 3.0015 ggc_alloc_stat
168846 2.8653 17736 0.4539 1303 0.1692 dfs_walk_all
124715 2.1164 5806 0.1486 1771 0.2300 record_reg_classes
123015 2.0875 13427 0.3437 7191 0.9340 walk_tree
97145 1.6485 40692 1.0415 1079 0.1401 find_reloads
81300 1.3796 802 0.0205 631 0.0820 _cpp_lex_direct
74550 1.2651 9374 0.2399 3920 0.5091 splay_tree_splay_helper
69103 1.1727 1888 0.0483 31028 4.0299 compparms
67387 1.1435 14429 0.3693 5538 0.7193 lookup_field_1
67245 1.1411 27061 0.6926 21805 2.8320 tsubst
63820 1.0830 25820 0.6608 23317 3.0284 htab_find_slot_with_hash
62961 1.0684 5905 0.1511 18892 2.4537 ht_lookup_with_hash
61731 1.0476 143774 3.6798 1811 0.2352 grokdeclarator
61177 1.0382 6439 0.1648 3442 0.4470 cp_walk_subtrees
57836 0.9815 1432 0.0367 138 0.0179
dfs_find_final_overrider_pre
57303 0.9724 335 0.0086 445 0.0578 _cpp_clean_line
50819 0.8624 2938 0.0752 48274 6.2699 push_to_top_level
--
Karel Gardas [EMAIL PROTECTED]
ObjectSecurity Ltd. http://www.objectsecurity.com