So, Jan, what do you think will be best solution for stage 1? Thanks, Igor
> -----Original Message----- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Vladimir Makarov > Sent: Monday, October 21, 2013 6:52 AM > To: Jan Hubicka; Zamyatin, Igor; gcc-patches@gcc.gnu.org > Subject: Re: Honnor ix86_accumulate_outgoing_args again > > On 13-10-19 4:30 PM, Jan Hubicka wrote: > >> Jan, > >> > >> Does this seem reasonable to you? > > Oops, sorry, I missed your email. (I was travelling and I am finishing > > a paper now). > >> Thanks, > >> Igor > >> > >>> -----Original Message----- > >>> From: Zamyatin, Igor > >>> Sent: Tuesday, October 15, 2013 3:48 PM > >>> To: Jan Hubicka > >>> Subject: RE: Honnor ix86_accumulate_outgoing_args again > >>> > >>> Jan, > >>> > >>> Now we have following prologue in, say, phi0 routine in equake > >>> > >>> 0x804aa90 1 push %ebp > >>> 0x804aa91 2 mov %esp,%ebp > >>> 0x804aa93 3 sub $0x18,%esp > >>> 0x804aa96 4 vmovsd 0x80ef7a8,%xmm0 > >>> 0x804aa9e 5 vmovsd 0x8(%ebp),%xmm1 > >>> 0x804aaa3 6 vcomisd %xmm1,%xmm0 <-- we see big stall somewhere > here > >>> or 1-2 instructions above > >>> > >>> While earlier it was > >>> > >>> 0x804abd0 1 sub $0x2c,%esp > >>> 0x804abd3 2 vmovsd 0x30(%esp),%xmm1 > >>> 0x804abd9 3 vmovsd 0x80efcc8,%xmm0 > >>> 0x804abe1 4 vcomisd %xmm1,%xmm0 > > Thanks for analysis! It is a different benchmark than for bulldozer, > > but apparently same case. Again we used to eliminate frame pointer > > here but IRS now doesn't Do you see the same regression with > > -fno-omit-frame-pointer -maccumulate-outgoing-args? > > > > I suppose this is a conflict in between the push instruction hanled by > > stack engine and initialization of EBP that isn't. That would explain > > why bulldozer don't seem to care about this particular benchmark (its > > stack engine seems to have quite different design). > > > > This is a bit sad situation - accumulate-outgoing-args is expensive > > code size wise and it seems we don't really need esp with -mno- > accumulate-outgoing-args. > > The non-accumulation code path was mistakely disabled for too long ;( > > > > Vladimir, how much effort do you think it will be to fix the frame > > pointer elimination here? > My guess is a week. The problem I am busy and having some problems with > two small projects right now which I'd like to include into gcc-4.9. > > But I think, this still can be fixed on stage2 as it is a PR. > > > I can imagine it is a quite tricky case. If so I would suggest adding > > m_CORE_ALL to X86_TUNE_ACCUMULATE_OUTGOING_ARGS with a > comment > > explaining the problem and mentioning the regression on equake on core > > and mgrid on Bulldizer and opening an enhancement request for this... > > > > I also wonder if direct ESP use and push/pop instructions are causing > > so noticeable issues, I wonder if we can't "shrink wrap" this into > > red-zone in the 64bit compilation. It seems that even with > > -maccumulate-outgoing-args pushing the frame allocation as late as > > possible in the function would be a good idea so it is not close to the > push/pop/call/ret. > > > >