https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69891
Uroš Bizjak <ubizjak at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2016-02-21
CC| |ubizjak at gmail dot com
Component|target |rtl-optimization
Ever confirmed|0 |1
--- Comment #1 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Zdenek Sojka from comment #0)
> Reproduces with x86_64 compiler -m32 as well.
(-mno-sse has to be added in case of x86_64 compiler with -m32).
This is RTL aliasing issue.
We start with following _optimized tree dump:
<bb 2>:
_2 = BIT_FIELD_REF <v32u32_1, 32, 0>;
...
_9 = _2 | 7;
BIT_FIELD_REF <v32u32_1, 32, 0> = _9;
...
v32u32_1 = { 0, 0, 0, 0, 0, 0, 0, 0 };
...
_19 = BIT_FIELD_REF <v32u32_1, 32, 0>;
...
_27 = _19 + _22;
...
which gets expanded to:
;; BIT_FIELD_REF <v32u32_1, 32, 0> = _9;
(insn 7 6 8 (parallel [
(set (reg:SI 121)
(ior:SI (reg:SI 87 [ _2 ])
(const_int 7 [0x7])))
(clobber (reg:CC 17 flags))
]) pr69891.c:19 -1
(nil))
(insn 8 7 0 (set (mem/j/c:SI (plus:SI (reg/f:SI 81 virtual-incoming-args)
(const_int 64 [0x40])) [2 v32u32_1+0 S4 A256])
(reg:SI 121)) pr69891.c:19 -1
(nil))
...
(insn 13 12 0 (set (reg:SI 119 [ _117 ])
(reg:SI 125)) pr69891.c:25 -1
(nil))
;; v32u32_1 = { 0, 0, 0, 0, 0, 0, 0, 0 };
(insn 14 13 15 (parallel [
(set (reg:SI 127)
(plus:SI (reg/f:SI 81 virtual-incoming-args)
(const_int 64 [0x40])))
(clobber (reg:CC 17 flags))
]) pr69891.c:31 -1
(nil))
(insn 15 14 16 (set (reg:SI 128)
(const_int 32 [0x20])) pr69891.c:31 -1
(nil))
(insn 16 15 17 (parallel [
(set (reg/f:SI 7 sp)
(plus:SI (reg/f:SI 7 sp)
(const_int -20 [0xffffffffffffffec])))
(clobber (reg:CC 17 flags))
]) pr69891.c:31 -1
(expr_list:REG_ARGS_SIZE (const_int 20 [0x14])
(nil)))
(insn 17 16 18 (set (mem:SI (pre_dec:SI (reg/f:SI 7 sp)) [2 S4 A32])
(reg:SI 128)) pr69891.c:31 -1
(expr_list:REG_ARGS_SIZE (const_int 24 [0x18])
(nil)))
(insn 18 17 19 (set (mem:SI (pre_dec:SI (reg/f:SI 7 sp)) [2 S4 A32])
(const_int 0 [0])) pr69891.c:31 -1
(expr_list:REG_ARGS_SIZE (const_int 28 [0x1c])
(nil)))
(insn 19 18 20 (set (mem/f:SI (pre_dec:SI (reg/f:SI 7 sp)) [4 S4 A32])
(reg:SI 127)) pr69891.c:31 -1
(expr_list:REG_ARGS_SIZE (const_int 32 [0x20])
(nil)))
(call_insn 20 19 21 (set (reg:SI 0 ax)
(call (mem:QI (symbol_ref:SI ("memset") [flags 0x41] <function_decl
0x7f5734764e00 memset>) [0 memset S1 A8])
(const_int 32 [0x20]))) pr69891.c:31 -1
(expr_list:REG_EH_REGION (const_int 0 [0])
(nil))
(nil))
...
(insn 170 169 171 (set (reg:SI 202)
(mem/j/c:SI (plus:SI (reg/f:SI 81 virtual-incoming-args)
(const_int 64 [0x40])) [2 v32u32_1+0 S4 A256])) pr69891.c:37 -1
(nil))
(insn 171 170 172 (set (reg:SI 203)
(mem/j/c:SI (plus:SI (reg/f:SI 81 virtual-incoming-args)
(const_int 120 [0x78])) [3 v32u64_1+24 S4 A64])) pr69891.c:37
-1
(nil))
(insn 172 171 173 (parallel [
(set (reg:SI 201)
(plus:SI (reg:SI 202)
(reg:SI 203)))
(clobber (reg:CC 17 flags))
]) pr69891.c:37 -1
(expr_list:REG_EQUAL (plus:SI (mem/j/c:SI (plus:SI (reg/f:SI 81
virtual-incoming-args)
(const_int 64 [0x40])) [2 v32u32_1+0 S4 A256])
(mem/j/c:SI (plus:SI (reg/f:SI 81 virtual-incoming-args)
(const_int 120 [0x78])) [3 v32u64_1+24 S4 A64]))
(nil)))
However, DSE1 pass propagates r121 (aka r207) from (insn 7) all the way to the
(insn 170), without considering aliasing memset in (insn 20).
5: r87:SI=[argp:SI+0x40]
6: {r89:HI=-r87:SI#0;clobber flags:CC;}
REG_UNUSED flags:CC
7: {r121:SI=r87:SI|0x7;clobber flags:CC;}
REG_DEAD r87:SI
REG_UNUSED flags:CC
186: r207:SI=r121:SI
8: [argp:SI+0x40]=r121:SI
...
14: {r127:SI=argp:SI+0x40;clobber flags:CC;}
REG_UNUSED flags:CC
16: {sp:SI=sp:SI-0x14;clobber flags:CC;}
REG_UNUSED flags:CC
REG_ARGS_SIZE 0x14
17: [--sp:SI]=0x20
REG_ARGS_SIZE 0x18
18: [--sp:SI]=0
REG_ARGS_SIZE 0x1c
19: [--sp:SI]=r127:SI
REG_DEAD r127:SI
REG_ARGS_SIZE 0x20
20: ax:SI=call [`memset'] argc:0x20
REG_UNUSED ax:SI
REG_EH_REGION 0
...
170: r202:SI=r207:SI
REG_DEAD r207:SI
171: r203:SI=[argp:SI+0x78]
172: {r201:SI=r202:SI+r203:SI;clobber flags:CC;}
REG_DEAD r203:SI
REG_DEAD r202:SI
REG_UNUSED flags:CC
REG_EQUAL [argp:SI+0x40]+[argp:SI+0x78]
Confirmed as RTL optimization issue.