------- Comment #12 from abel at gcc dot gnu dot org 2008-11-25 14:28 ------- I have somewhat cut the testcase, having the call with two ARG3's instead of ten coming from ARG4. With this smaller testcase, I see that the most time is taken by register renaming (cross to spu-elf, compiled with -O2):
scheduling : 0.66 ( 2%) usr 0.03 (30%) sys 0.69 ( 2%) wall 19208 kB (32%) ggc integrated RA : 4.55 (11%) usr 0.00 ( 0%) sys 4.53 (11%) wall 829 kB ( 1%) ggc reload : 2.57 ( 6%) usr 0.01 (10%) sys 2.58 ( 6%) wall 11996 kB (20%) ggc reload CSE regs : 0.23 ( 1%) usr 0.00 ( 0%) sys 0.22 ( 1%) wall 2940 kB ( 5%) ggc peephole 2 : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc rename registers : 32.21 (76%) usr 0.01 (10%) sys 32.22 (75%) wall 993 kB ( 2%) ggc scheduling 2 : 0.58 ( 1%) usr 0.03 (30%) sys 0.61 ( 1%) wall 5375 kB ( 9%) ggc machine dep reorg : 0.59 ( 1%) usr 0.01 (10%) sys 0.60 ( 1%) wall 5569 kB ( 9%) ggc final : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall 0 kB ( 0%) ggc TOTAL : 42.59 0.10 42.71 59919 kB With -O2 -fno-rename-registers, I get scheduling : 0.66 ( 6%) usr 0.04 (36%) sys 0.70 ( 7%) wall 19208 kB (33%) ggc integrated RA : 4.56 (45%) usr 0.00 ( 0%) sys 4.57 (44%) wall 829 kB ( 1%) ggc reload : 2.58 (25%) usr 0.00 ( 0%) sys 2.59 (25%) wall 11996 kB (21%) ggc reload CSE regs : 0.23 ( 2%) usr 0.00 ( 0%) sys 0.24 ( 2%) wall 2940 kB ( 5%) ggc thread pro- & epilogue: 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 22 kB ( 0%) ggc peephole 2 : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc rename registers : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 0 kB ( 0%) ggc scheduling 2 : 0.49 ( 5%) usr 0.04 (36%) sys 0.52 ( 5%) wall 4949 kB ( 9%) ggc machine dep reorg : 0.50 ( 5%) usr 0.02 (18%) sys 0.51 ( 5%) wall 5055 kB ( 9%) ggc reorder blocks : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc final : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall 0 kB ( 0%) ggc TOTAL : 10.21 0.11 10.35 57732 kB -frename-registers is enabled by default on spu, so no wonder this is not seen on other targets. oprofile shows me this: Samples % linenr info image name app name symbol name ------------------------------------------------------------------------------- 362678 29.6888 rtlanal.c:1412 cc1 cc1 note_stores 362678 100.000 rtlanal.c:1412 cc1 cc1 note_stores [self] ------------------------------------------------------------------------------- 304520 24.9280 regrename.c:1941 cc1 cc1 rest_of_handle_regrename 304520 99.8727 regrename.c:1941 cc1 cc1 rest_of_handle_regrename [self] 201 0.0659 bitmap.c:630 cc1 cc1 bitmap_set_bit 99 0.0325 df-scan.c:1217 cc1 cc1 df_insn_rescan 39 0.0128 df-problems.c:107 cc1 cc1 df_grow_bb_info 24 0.0079 (no location information) cc1 cc1 bitmap_clear_bit 17 0.0056 df-scan.c:573 cc1 cc1 df_grow_reg_info 8 0.0026 emit-rtl.c:1131 cc1 cc1 max_reg_num ------------------------------------------------------------------------------- 164550 13.4701 regrename.c:120 cc1 cc1 clear_dead_regs 164550 100.000 regrename.c:120 cc1 cc1 clear_dead_regs [self] ------------------------------------------------------------------------------- 6441 100.000 ira-color.c:1044 cc1 cc1 allocno_spill_priority_compare 59894 4.9029 ira-color.c:1044 cc1 cc1 allocno_spill_priority_compare 59894 86.6547 ira-color.c:1044 cc1 cc1 allocno_spill_priority_compare [self] 6441 9.3188 ira-color.c:1044 cc1 cc1 allocno_spill_priority_compare 1148 1.6609 splay-tree.c:348 cc1 cc1 splay_tree_remove 928 1.3426 splay-tree.c:139 cc1 cc1 splay_tree_splay 460 0.6655 ira-color.c:1083 cc1 cc1 splay_tree_free 247 0.3574 alloc-pool.c:325 cc1 cc1 pool_free I don't have enough information to understand where note_stores calls come from, and I stopped wondering for now. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31850