Ping ? I see that Jim has clarified the comments from Andrew.
Thanks, Kugan On 13 October 2017 at 08:48, Jim Wilson <wil...@tuliptree.org> wrote: > On Fri, 2017-09-22 at 14:11 -0700, Andrew Pinski wrote: >> On Fri, Sep 22, 2017 at 11:39 AM, Jim Wilson <jim.wil...@linaro.org> >> wrote: >> > >> > On Fri, Sep 22, 2017 at 10:58 AM, Andrew Pinski <pins...@gmail.com> >> > wrote: >> > > >> > > Two overall comments: >> > > * What about splitting register_offset into two different >> > > elements, >> > > one for non 128bit modes and one for 128bit (and more; OI, etc.) >> > > modes >> > > so you get better address generation right away for the simd load >> > > cases rather than having LRA/reload having to reload the address >> > > into >> > > a register. >> > I'm not sure if changing register_offset cost would make a >> > difference, >> > since costs are usually used during optimization, not during >> > address >> > generation. This is something that I didn't think to try >> > though. I >> > can try taking a look at this. >> It does taken into account when fwprop is propagating the addition >> into >> the MEM (the tree level is always a_1 = POINTER_PLUS_EXPR; >> MEM_REF(a_1)). >> IV-OPTS will produce much better code if the address_cost is correct. >> >> It looks like no other pass (combine, etc.) would take that into >> account except for postreload CSE but maybe they should. > > I tried increasing the cost of register_offset. This got rid of the > reg+reg addressing mode in the middle of the main loop for lmbench > stream copy, but did not eliminate it after the main loop. > > The tree optimized dump has > _52 = a_15 + _51; > _53 = c_17 + _51; > _54 = *_52; > *_53 = _54; > and the RTL expand dump has > (insn 64 63 65 10 (set (reg:DF 96 [ _54 ]) > (mem:DF (plus:DI (reg/v/f:DI 78 [ a ]) > (reg:DI 93 [ _51 ])) [3 *_52+0 S8 A64])) "stream.c":223 > -1 > (nil)) > (insn 65 64 66 10 (set (mem:DF (plus:DI (reg/v/f:DI 79 [ c ]) > (reg:DI 93 [ _51 ])) [3 *_53+0 S8 A64]) > (reg:DF 96 [ _54 ])) "stream.c":223 -1 > (nil)) > > That may be fixable, but there is a bigger problem here which is that > increasing the costs of register_offset affects both loads and stores. > On falkor, it is only quad-word stores that are inefficient with a > reg+reg address. Quad-word loads with a reg+reg address are faster > than the equivalent add/ldr. Disabling reg+reg address for quad-word > loads will hurt performance. > > Since the address cost stuff makes no distinction between loads and > stores, I see no way to get the result I need by using address costs. > I can only get the result I need by modifying the md file. > >> > I did try writing a patch to modify predicates to disallow reg >> > offset >> > for 128bit modes, and that got complicated, as I had to split apart >> > a >> > number of patterns in the aarch64-simd.md file that accept both VD >> > and >> > VQ modes. I ended up with a patch 3-4 times as big as the one I >> > submitted, without any additional performance improvement, so it >> > wasn't worth the trouble. >> > >> > > >> > > * Maybe adding a testcase to the testsuite to show this change. >> > Yes, I can add a testcase. >> > >> > > >> > > One extra comment: >> > > * should we change the generic tuning to avoid reg+reg for 128bit >> > > modes? >> > Are there other targets with a similar problem? I only know that >> > it >> > is a problem for Falkor. It might be a loss for some targets as it >> > is >> > replacing one instruction with two. >> Well that is why I was suggesting the address cost model change. >> Because the cost model change actually might provide better code in >> the first place and still allow for reasonable generic code to be >> produced. > > The patch I posted only affects Falkor. It doesn't change generic > code. I don't know of any reason why we need to change generic code > here. > > The Falkor core has out-of-order execution and multiple function units, > so there isn't any noticeable performance gain from trying to fix this > earlier. Fixing this with a md file change gives optimal performance > for the testcases I've looked at. > > Since I'm no longer at Linaro, I expect that someone else will take > over this patch submission. I will create a bug report to document the > issue, to make it easier to track it and hand off to someone else. > > Jim >