https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90204
--- Comment #14 from rguenther at suse dot de <rguenther at suse dot de> --- On Fri, 26 Apr 2019, crazylht at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90204 > > --- Comment #13 from Hongtao.liu <crazylht at gmail dot com> --- > (In reply to rguent...@suse.de from comment #10) > > On Thu, 25 Apr 2019, crazylht at gmail dot com wrote: > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90204 > > > > > > --- Comment #9 from Hongtao.liu <crazylht at gmail dot com> --- > > > Also what's better between aligned load/store of smaller size VS > > > unaligned > > > load/store of bigger size? > > > > > > aligned load/store of smaller size: > > > > > > movq %rdx, (%rdi) > > > movq -56(%rsp), %rdx > > > movq %rdx, 8(%rdi) > > > movq -48(%rsp), %rdx > > > movq %rdx, 16(%rdi) > > > movq -40(%rsp), %rdx > > > movq %rdx, 24(%rdi) > > > vmovq %xmm0, 32(%rax) > > > movq -24(%rsp), %rdx > > > movq %rdx, 40(%rdi) > > > movq -16(%rsp), %rdx > > > movq %rdx, 48(%rdi) > > > movq -8(%rsp), %rdx > > > movq %rdx, 56(%rdi) > > > > > > unaligned load/store of bigger size: > > > > > > vmovups %xmm2, (%rdi) > > > vmovups %xmm3, 16(%rdi) > > > vmovups %xmm4, 32(%rdi) > > > vmovups %xmm5, 48(%rdi) > > > > bigger stores are almost always a win while bigger loads have > > the possibility to run into store-to-load forwarding issues > > (and bigger stores eventually mitigate them). Based on > > CPU tuning we'd also eventually end up with mov[lh]ps splitting > > unaligned loads/stores. > > From > https://software.intel.com/en-us/download/intel-64-and-ia-32-architectures-optimization-reference-manual > > 14.6.3 Prefer Aligned Stores Over Aligned Loads > > Unaligned stores are likely to cause greater performance degradation than > unaligned loads, since there > is a very high penalty on stores to a split cache-line that crosses pages. > This > penalty is estimated at 150 > cycles. Loads that cross a page boundary are executed at retirement. That's a thing to keep in mind when peeling for alignment, but as a general rule for straight-line code the possibility of hitting a page boundary with an unaligned store is small while hitting STLF failure is more likely.