[Bug target/90424] memcpy into vector builtin not optimized

rguenth at gcc dot gnu.org Thu, 16 May 2019 01:19:53 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90424


Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |NEW
                 CC|                            |jakub at gcc dot gnu.org,
                   |                            |rguenth at gcc dot gnu.org,
                   |                            |rsandifo at gcc dot gnu.org,
                   |                            |uros at gcc dot gnu.org
           Assignee|rguenth at gcc dot gnu.org         |unassigned at gcc dot 
gnu.org

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #3)
> OK, so the "easier" way to allow aligned sub-vector inserts produces for
> 
> typedef unsigned char v16qi __attribute__((vector_size(16)));
> v16qi load (const void *p)
> {
>   v16qi r;
>   __builtin_memcpy (&r, p, 8);
>   return r;
> }
> 
> load (const void * p)
> {
>   v16qi r;
>   long unsigned int _3;
>   v16qi _5;
>   vector(8) unsigned char _7;
> 
>   <bb 2> :
>   _3 = MEM[(char * {ref-all})p_2(D)];
>   _7 = VIEW_CONVERT_EXPR<vector(8) unsigned char>(_3);
>   r_9 = BIT_INSERT_EXPR <r_8(D), _7, 0 (64 bits)>;
>   _5 = r_9;
>   return _5;
> 
> and unfortunately (as I feared)
> 
> load:
> .LFB0:
>         .cfi_startproc
>         movq    (%rdi), %rax
>         pxor    %xmm1, %xmm1
>         movaps  %xmm1, -24(%rsp)
>         movq    %rax, -24(%rsp)
>         movdqa  -24(%rsp), %xmm0
>         ret

So we're now at this state.  This is where either simplifications
or canonicalizations on SSA can be made, middle-end changes to
BIT_INSERT_EXPR expansion, possibly via extending vec_set
in a similar way vec_init was.  Note vec_set can end up as

(subreg:N
  (vec_select 
    (vec_concat:V2I
      (subreg:VI into:N) 
      (vec_duplicate:VI (subreg:I to_insert:M))
    (... )))

when a proper (vector) integer mode exists to cover the insertion
and when a proper 2xwide vector mode exists for the concat.
You could argue that
GIMPLE should also use permutes for inserts (but then not use
CONSTRUCTOR for the splat).  That is, I think both GIMPLE and RTL
could use some streamlining here (for the RTL parts that's always
difficult because you have to adjust many targets).  RTL
definitely misses a vec_perm operation to consolidate vec_select
and vec_merge.

I'm not going to work on that part for this moment.

[Bug target/90424] memcpy into vector builtin not optimized

Reply via email to