https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98673
Bug ID: 98673
Summary: pass fre4 inhibit pass dom3 to create much more
optimized code
Product: gcc
Version: 10.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: rjiejie at me dot com
Target Milestone: ---
Created attachment 49962
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49962&action=edit
bug test file
a, compiler option:
cc1 -mabi=lp64d -march=rv64gc -O2 -S
b, hot code in function t_run_test:
j .L30
.L39:
mv a4,a3
.L30:
ld a2,8(a5)
addi a3,a4,1
slli t3,a4,3
ble a2,a1,.L28
ld t5,0(a5)
bge a1,t5,.L50
.L28:
addi a5,a5,8
bne a3,a0,.L39 : hot code loop to .L39
better code in version 8.4 with same compiler option:
=====================================================
.L30:
ld t1,8(a4)
slli a7,a5,3
ble t1,a3,.L28
ld t4,0(a4)
bge a3,t4,.L50
.L28:
addi a5,a5,1
addi a4,a4,8
bne a5,t3,.L30 : hot code loop to .L30
v10.2.0 gcc has more one instruction than v8.4.0.
analize gcc pass of source code in v10.2.0:
===========================================
before pass fr4:
----------------
<bb 8> [local count: 82176881]:
engLoad.11_20 = engLoad;
loadValue.13_26 = loadValue;
_410 = (unsigned long) numXEntries.17_218;
_409 = _410 + 18446744073709551615;
_408 = (long int) _409;
... ...
<bb 12> [local count: 986782143]:
i1_174 = i1_6 + 1;
if (i1_174 != _408)
goto <bb 9>; [94.50%]
else
goto <bb 13>; [5.50%]
<bb 13> [local count: 54273018]:
# i1_420 = PHI <i1_174(12)>
_433 = (long unsigned int) i1_420;
_434 = _433 + 1;
_435 = _434 * 8;
_436 = i1_420 + 1;
_440 = _435 - 8;
_442 = engLoad.11_20 + _440;
goto <bb 15>; [100.00%]
after pass fr4:
---------------
<bb 8> [local count: 82176881]:
engLoad.11_20 = engLoad;
loadValue.13_26 = loadValue;
_410 = (unsigned long) numXEntries.17_218;
_409 = _410 + 18446744073709551615;
... ...
<bb 12> [local count: 986782143]:
i1_174 = i1_6 + 1;
if (i1_174 != _213)
goto <bb 9>; [94.50%]
else
goto <bb 13>; [5.50%]
<bb 13> [local count: 54273018]:
_433 = (long unsigned int) i1_174;
_434 = _433 + 1;
_435 = _434 * 8;
_436 = i1_174 + 1;
_440 = _435 - 8;
_442 = engLoad.11_20 + _440;
goto <bb 15>; [100.00%]
pass fr4 remove 'Removing dead stmt _408 = (long int) _409;',
pass dom3 can't optimize this <bb 13> about '_433 = (long unsigned int)
i1_174;'
if <bb 13> use i1_174 node same as <bb 12>, so that conflict will be happened
in pass expand on processing coalesced ssa/phi nodes, and then will split edge.
need help ....:)