https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83628
Bug ID: 83628 Summary: performance regression when accessing arrays on alpha Product: gcc Version: 7.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: mikulas at artax dot karlin.mff.cuni.cz Target Milestone: --- The alpha architecture has instructions s4add, s8add, s4sub and s8sub. These instructions shift the first argument left by 4 or 8 bits and add or subtract the second argument. GCC version 6 and 7 is not capable of using these instructions to perform addition and shift. It always generates these instructions with the second argument zero and generates separate add or sub instruction afterwards. GCC 5 and before use these instructions correctly. These intructions are used to access arrays and thus this bug causes slowdown of any code that works with arrays. Example: $ cat index.c int get_int(int *p, long idx) { return p[idx]; } long long get_long_long(long long *p, long idx) { return p[idx]; } $ alpha-linux-gnu-gcc-6 -c -O2 index.c $ alpha-linux-gnu-objdump -d index.o 0000000000000000 <get_int>: 0: 51 14 20 42 s4addq a1,0,a1 4: 11 04 11 42 addq a0,a1,a1 8: 00 00 11 a0 ldl v0,0(a1) c: 01 80 fa 6b ret 0000000000000010 <get_long_long>: 10: 51 16 20 42 s8addq a1,0,a1 14: 11 04 11 42 addq a0,a1,a1 18: 00 00 11 a4 ldq v0,0(a1) 1c: 01 80 fa 6b ret $ cat s4add.c unsigned long s4a(unsigned long a, unsigned long b) { return a + b * 4; } unsigned long s8a(unsigned long a, unsigned long b) { return a + b * 8; } $ alpha-linux-gnu-gcc-6 -c -O2 s4add.c $ alpha-linux-gnu-objdump -d s4add.o 0000000000000000 <s4a>: 0: 51 14 20 42 s4addq a1,0,a1 4: 00 04 30 42 addq a1,a0,v0 8: 01 80 fa 6b ret c: 00 00 fe 2f unop 0000000000000010 <s8a>: 10: 51 16 20 42 s8addq a1,0,a1 14: 00 04 30 42 addq a1,a0,v0 18: 01 80 fa 6b ret 1c: 00 00 fe 2f unop With gcc 5 and previous, optimal code is generated: $ alpha-linux-gnu-gcc-5 -c -O2 index.c $ alpha-linux-gnu-objdump -d index.o 0000000000000000 <get_int>: 0: 51 04 30 42 s4addq a1,a0,a1 4: 00 00 11 a0 ldl v0,0(a1) 8: 01 80 fa 6b ret c: 00 00 fe 2f unop 0000000000000010 <get_long_long>: 10: 51 06 30 42 s8addq a1,a0,a1 14: 00 00 11 a4 ldq v0,0(a1) 18: 01 80 fa 6b ret 1c: 00 00 fe 2f unop $ alpha-linux-gnu-gcc-5 -c -O2 s4add.c $ alpha-linux-gnu-objdump -d s4add.o 0000000000000000 <s4a>: 0: 40 04 30 42 s4addq a1,a0,v0 4: 01 80 fa 6b ret 8: 1f 04 ff 47 nop c: 00 00 fe 2f unop 0000000000000010 <s8a>: 10: 40 06 30 42 s8addq a1,a0,v0 14: 01 80 fa 6b ret 18: 1f 04 ff 47 nop 1c: 00 00 fe 2f unop I bisected the problem and it is caused by this commit: commit fabf26080cb4cc3fecd30d409ec9c63f0ec42eff Author: vekumar <vekumar@138bc75d-0d04-0410-961f-82ee72b054a4> Date: Thu May 7 10:47:54 2015 +0000 2015-05-07 Venkataramanan Kumar <venkataramanan.ku...@amd.com> * combine.c (make_compound_operation): Remove checks for PLUS/MINUS rtx type. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@222874 138bc75d-0d04-0410-961f-82ee72b054a4 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,8 @@ +2015-05-07 Venkataramanan Kumar <venkataramanan.ku...@amd.com> + + * combine.c (make_compound_operation): Remove checks for PLUS/MINUS + rtx type. + 2015-05-07 Richard Biener <rguent...@suse.de> PR tree-optimization/66002 diff --git a/gcc/combine.c b/gcc/combine.c index c04146ae645..9e3eb030a63 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -7723,9 +7723,8 @@ extract_left_shift (rtx x, int count) We try, as much as possible, to re-use rtl expressions to save memory. IN_CODE says what kind of expression we are processing. Normally, it is - SET. In a memory address (inside a MEM, PLUS or minus, the latter two - being kludges), it is MEM. When processing the arguments of a comparison - or a COMPARE against zero, it is COMPARE. */ + SET. In a memory address it is MEM. When processing the arguments of + a comparison or a COMPARE against zero, it is COMPARE. */ rtx make_compound_operation (rtx x, enum rtx_code in_code) @@ -7745,8 +7744,6 @@ make_compound_operation (rtx x, enum rtx_code in_code) but once inside, go back to our default of SET. */ next_code = (code == MEM ? MEM - : ((code == PLUS || code == MINUS) - && SCALAR_INT_MODE_P (mode)) ? MEM : ((code == COMPARE || COMPARISON_P (x)) && XEXP (x, 1) == const0_rtx) ? COMPARE : in_code == COMPARE ? SET : in_code);