------- Comment #5 from jakub at gcc dot gnu dot org 2010-04-06 10:07 ------- callgrind --inclusive=yes says on this: 438,822,838,411 /usr/src/gcc/obj/gcc/../../gcc/var-tracking.c:variable_tracking_main 429,638,024,208 /usr/src/gcc/obj/libiberty/../../libiberty/hashtab.c:htab_traverse_noresize [/usr/src/gcc/obj/gcc/cc1plus] 428,305,658,767 /usr/src/gcc/obj/libiberty/../../libiberty/hashtab.c:htab_traverse [/usr/src/gcc/obj/gcc/cc1plus] 417,506,443,433 ../../gcc/var-tracking.c:vt_find_locations [/usr/src/gcc/obj/gcc/cc1plus] 389,486,317,740 ../../gcc/var-tracking.c:dataflow_set_merge [/usr/src/gcc/obj/gcc/cc1plus] 389,242,104,098 /usr/src/gcc/obj/gcc/../../gcc/var-tracking.c:variable_merge_over_cur 389,242,104,098 ../../gcc/var-tracking.c:variable_merge_over_cur [/usr/src/gcc/obj/gcc/cc1plus] 345,572,722,020 ../../gcc/var-tracking.c:intersect_loc_chains [/usr/src/gcc/obj/gcc/cc1plus] 292,844,158,243 ../../gcc/var-tracking.c:find_loc_in_1pdv [/usr/src/gcc/obj/gcc/cc1plus] 117,800,020,698 /usr/src/gcc/obj/gcc/../../gcc/rtl.c:rtx_equal_p 117,800,020,698 ../../gcc/rtl.c:rtx_equal_p [/usr/src/gcc/obj/gcc/cc1plus] 78,786,470,332 /usr/src/gcc/obj/libiberty/../../libiberty/hashtab.c:htab_find_with_hash [/usr/src/gcc/obj/gcc/cc1plus]
so, as a micro-optimization what could work is use the spare bits in location_chain_def (30 bits on 32-bit hosts, 32 + 30 bits on 64-bit hosts) for something we could compare quickly in place of rtx_equal_p resp. loc_cmp. Say, having two or 3 topmost bits of those 30 for code (0 for REG, 1 for MEM, 2 for VALUE, 3 for anything else) and for each code encode something in the other bits (e.g. REGNO for REG, 2 bits REG/MEM/VALUE/other for MEM's address than the rest of bits, saturating VALUE uid for VALUEs, something else for other codes). Then we'd call rtx_equal_p or loc_cmp only if two values are equal. Or better find a way how to do less work during dataflow merges. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43632