[Bug tree-optimization/88760] GCC unrolling is suboptimal

rguenther at suse dot de Tue, 15 Jan 2019 03:20:22 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760


--- Comment #9 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 15 Jan 2019, ktkachov at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760
> 
> --- Comment #8 from ktkachov at gcc dot gnu.org ---
> btw looks likes ICC vectorises this as well as unrolling:
> ..B1.14:                        
>         movl      (%rcx,%rbx,4), %r15d                          
>         vmovsd    (%rdi,%r15,8), %xmm2                          
>         movl      4(%rcx,%rbx,4), %r15d                         
>         vmovhpd   (%rdi,%r15,8), %xmm2, %xmm3                   
>         movl      8(%rcx,%rbx,4), %r15d                         
>         vfmadd231pd (%r10,%rbx,8), %xmm3, %xmm0                 
>         vmovsd    (%rdi,%r15,8), %xmm4                          
>         movl      12(%rcx,%rbx,4), %r15d                        
>         vmovhpd   (%rdi,%r15,8), %xmm4, %xmm5                   
>         movl      16(%rcx,%rbx,4), %r15d                        
>         vfmadd231pd 16(%r10,%rbx,8), %xmm5, %xmm1               
>         vmovsd    (%rdi,%r15,8), %xmm6                          
>         movl      20(%rcx,%rbx,4), %r15d                        
>         vmovhpd   (%rdi,%r15,8), %xmm6, %xmm7                   
>         movl      24(%rcx,%rbx,4), %r15d                        
>         vfmadd231pd 32(%r10,%rbx,8), %xmm7, %xmm0               
>         vmovsd    (%rdi,%r15,8), %xmm8                          
>         movl      28(%rcx,%rbx,4), %r15d                        
>         vmovhpd   (%rdi,%r15,8), %xmm8, %xmm9                   
>         vfmadd231pd 48(%r10,%rbx,8), %xmm9, %xmm1               
>         addq      $8, %rbx                                      
>         cmpq      %r14, %rbx                                    
>         jb        ..B1.14 
> 
> Is that something GCC could reasonably do?

GCC could choose a larger vectorization factor, yes.
The longer epilogue could be vectorized with the same
vector size again then.

[Bug tree-optimization/88760] GCC unrolling is suboptimal

Reply via email to