https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90424
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEW CC| |jakub at gcc dot gnu.org, | |rguenth at gcc dot gnu.org, | |rsandifo at gcc dot gnu.org, | |uros at gcc dot gnu.org Assignee|rguenth at gcc dot gnu.org |unassigned at gcc dot gnu.org --- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Richard Biener from comment #3) > OK, so the "easier" way to allow aligned sub-vector inserts produces for > > typedef unsigned char v16qi __attribute__((vector_size(16))); > v16qi load (const void *p) > { > v16qi r; > __builtin_memcpy (&r, p, 8); > return r; > } > > load (const void * p) > { > v16qi r; > long unsigned int _3; > v16qi _5; > vector(8) unsigned char _7; > > <bb 2> : > _3 = MEM[(char * {ref-all})p_2(D)]; > _7 = VIEW_CONVERT_EXPR<vector(8) unsigned char>(_3); > r_9 = BIT_INSERT_EXPR <r_8(D), _7, 0 (64 bits)>; > _5 = r_9; > return _5; > > and unfortunately (as I feared) > > load: > .LFB0: > .cfi_startproc > movq (%rdi), %rax > pxor %xmm1, %xmm1 > movaps %xmm1, -24(%rsp) > movq %rax, -24(%rsp) > movdqa -24(%rsp), %xmm0 > ret So we're now at this state. This is where either simplifications or canonicalizations on SSA can be made, middle-end changes to BIT_INSERT_EXPR expansion, possibly via extending vec_set in a similar way vec_init was. Note vec_set can end up as (subreg:N (vec_select (vec_concat:V2I (subreg:VI into:N) (vec_duplicate:VI (subreg:I to_insert:M)) (... ))) when a proper (vector) integer mode exists to cover the insertion and when a proper 2xwide vector mode exists for the concat. You could argue that GIMPLE should also use permutes for inserts (but then not use CONSTRUCTOR for the splat). That is, I think both GIMPLE and RTL could use some streamlining here (for the RTL parts that's always difficult because you have to adjust many targets). RTL definitely misses a vec_perm operation to consolidate vec_select and vec_merge. I'm not going to work on that part for this moment.