On Sat, Jan 11, 2014 at 2:12 AM, Richard Sandiford <rdsandif...@googlemail.com> wrote: > Tejas Belagod <tbela...@arm.com> writes: >> When I relaxed CANNOT_CHANGE_MODE_CLASS to undefined for AArch64, >> gcc.c-torture/execute/copysign1.c generates incorrect code because LRA cannot >> seem to handle subregs like >> >> (subreg:DI (reg:TF hard_reg) 8) >> >> on hard registers where the subreg byte offset is unaligned to a hard >> register >> boundary(16 for AArch64). It seems to quietly ignore the 8 and resolves this >> to >> incorrect an hard register during reload. >> >> When I compile this test with -O3, >> >> long double >> cl (long double x, long double y) >> { >> return __builtin_copysignl (x, y); >> } >> >> cs.c.213r.ira: >> >> (insn 26 10 33 2 (set (reg:DI 87 [ y+8 ]) >> (subreg:DI (reg:TF 33 v1 [ y ]) 8)) cs.c:4 34 {*movdi_aarch64} >> (expr_list:REG_DEAD (reg:TF 33 v1 [ y ]) >> (nil))) >> (insn 33 26 35 2 (set (reg:TF 93) >> (reg:TF 32 v0 [ x ])) cs.c:4 40 {*movtf_aarch64} >> (expr_list:REG_DEAD (reg:TF 32 v0 [ x ]) >> (nil))) >> (insn 35 33 34 2 (set (reg:DI 92 [ x+8 ]) >> (subreg:DI (reg:TF 93) 8)) cs.c:4 34 {*movdi_aarch64} >> (nil)) >> (insn 34 35 23 2 (set (reg:DI 91 [ x ]) >> (subreg:DI (reg:TF 93) 0)) cs.c:4 34 {*movdi_aarch64} >> (expr_list:REG_DEAD (reg:TF 93) >> (nil))) >> .... >> >> cs.c.214r.reload >> >> (insn 26 10 33 2 (set (reg:DI 2 x2 [orig:87 y+8 ] [87]) >> (reg:DI 33 v1 [ y+8 ])) cs.c:4 34 {*movdi_aarch64} >> (nil)) >> (insn 33 26 35 2 (set (reg:TF 0 x0 [93]) >> (reg:TF 32 v0 [ x ])) cs.c:4 40 {*movtf_aarch64} >> (nil)) >> (insn 35 33 34 2 (set (reg:DI 1 x1 [orig:92 x+8 ] [92]) >> (reg:DI 1 x1 [+8 ])) cs.c:4 34 {*movdi_aarch64} >> (nil)) >> (insn 34 35 8 2 (set (reg:DI 0 x0 [orig:91 x ] [91]) >> (reg:DI 0 x0 [93])) cs.c:4 34 {*movdi_aarch64} >> (nil)) >> ..... >> >> You can see the changes to insn 26 before and after reload - the SUBREG_BYTE >> offset of 8 seems to have been translated to v0 instead of v0.d[1] by >> get_hard_regno (). >> >> What's interesting here is that the SUBREG_BYTE that is generated for >> >> (subreg:DI (reg:TF 33 v1 [ y ]) 8) >> >> isn't aligned to a hard register boundary on SIMD regs where UNITS_PER_VREG >> for >> AArch64 is 16. Therefore when this subreg is resolved, it resolves to v1 >> instead >> of v1.d[1]. Is this something going wrong in LRA or is this a more >> fundamental >> problem with generating subregs of hard regs with unaligned subreg byte >> offsets? >> The same subreg on a pseudo works OK because in insn 33, the TF mode is >> allocated integer registers and all is well. > > I think this is the same problem that was being discussed for x86 > after your no-op vec-select patch: > > http://gcc.gnu.org/ml/gcc-patches/2013-12/msg00801.html > > and long following thread. > > I'd still like to solve this in a target-independent way rather than add > an offset to CANNOT_CHANGE_MODE_CLASS, but I haven't had time to look at > it...
How about this patch http://gcc.gnu.org/git/?p=gcc.git;a=patch;h=23023006b946e06b6fd93786585f2f8cd4837956 I tested it on Linux/x86-64 without any regressions. -- H.J.