https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62662
Bug ID: 62662 Summary: [4.9/5 Regression] Miscompilation of Qt on s390x Product: gcc Version: 4.9.1 Status: UNCONFIRMED Keywords: wrong-code Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jakub at gcc dot gnu.org CC: krebbel at gcc dot gnu.org, uweigand at gcc dot gnu.org Target: s390x-linux struct A { int a; void b (); }; int c; struct B { struct C { A d; }; static C e; }; struct D { B::C *d; D () : d (&B::e) { d->d.b (); } }; struct F { F (int *); int *f; D g; }; void A::b () { volatile int x; __asm__("" : "=&d" (c), "=&d" (x), "=m" (*&a) : "a" (&a), "dm" (0)); } F::F (int *p1) : f (p1) {} compiled with -m64 -O2 -fvisibility=hidden -fPIC -march=z9-109 -mtune=z10 results in wrong-code: stg %r15,120(%r15) stg %r3,0(%r2) .LCFI3: lay %r15,-168(%r15) .LCFI4: larl %r4,c larl %r1,_ZN1B1eE@GOTENT lhi %r5,0 lg %r1,0(%r1) stg %r1,8(%r2) st %r2,0(%r4) st %r3,164(%r15) lg %r4,280(%r15) lg %r15,288(%r15) .LCFI5: br %r4 The %r14 return register is not saved to the return register stack slot in the prologue, but is restored from it in the epilogue and jumped to. >From quick look at this, this seems to be because of bad interaction in between /* Fetch return address from stack before load multiple, this will do good for scheduling. */ code in s390_emit_epilogue and s390_optimize_prologue. If cfun_frame_layout.save_return_addr_p is false, and first register needing save during s390_emit_epilogue is below 13 and last above 14, then s390_emit_epilogue attempts to optimize and loads the return register into typically %r4 before doing the load multiple insn, but s390_optimize_prologue doesn't take this into account, it will remove the store multiple insn in the prologue if at that point only %r15 needs saving, and also replace the load multiple, but keep loading from return register stack slot into %r4 and return to %r4. So, either s390_optimize_prologue needs to adjust also the return insn (and remove the load of the return address from stack slot if DSE can't handle it?), or we need to arrange in this case that s390_optimize_prologue doesn't optimize the return address store away.