Richard Biener <rguent...@suse.de> writes: > The following makes SSA rewrite (update-address-taken) recognize > sets of aligned sub-vectors in aligned position > (v2qi into v16qi, but esp. v8qi into v16qi). It uses the > BIT_INSERT_EXPR support for this, enabling that for vector > typed values. This makes us turn for example > > typedef unsigned char v16qi __attribute__((vector_size(16))); > v16qi load (const void *p) > { > v16qi r; > __builtin_memcpy (&r, p, 8); > return r; > } > > into the following > > load (const void * p) > { > v16qi r; > long unsigned int _3; > v16qi _5; > vector(8) unsigned char _7; > > <bb 2> : > _3 = MEM[(char * {ref-all})p_2(D)]; > _7 = VIEW_CONVERT_EXPR<vector(8) unsigned char>(_3); > r_9 = BIT_INSERT_EXPR <r_8(D), _7, 0 (64 bits)>; > _5 = r_9; > return _5; > > this isn't yet nicely expanded since the BIT_INSERT_EXPR > expansion simply goes through store_bit_field and there's > no vector-mode vec_set. > > Similar as to the single-element insert SSA rewrite already > handles the transform is conditional on the involved > vector types having non-BLKmode. This is somewhat bad > since the transform is supposed to enable SSA optimizations > by rewriting memory vectors into SSA form. Since splitting > of larger generic vectors happens very much later only > this pessimizes their use. But the BIT_INSERT_EXPR > expansion doesn't cope with BLKmode entities (source or > destination). > > Extending BIT_INSERT_EXPR this way seems natural given > the support of CONSTRUCTORs with smaller vectors. > BIT_FIELD_REF isn't particularly restricted so can be > used to extract sub-vectors as well. > > Code generation is as bad as before (RTL expansion eventually > spills) but SSA optimizations are enabled on less trivial > testcases. > > Boostrap / regtest running on x86_64-unknown-linux-gnu. > > Comments? > > Richard. > > 2019-05-14 Richard Biener <rguent...@suse.de> > > PR tree-optimization/90424 > * tree-ssa.c (non_rewritable_lvalue_p): Handle inserts from > aligned subvectors. > (execute_update_addresses_taken): Likewise. > * tree-cfg.c (verify_gimple_assign_ternary): Likewise. > > * g++.target/i386/pr90424-1.C: New testcase. > * g++.target/i386/pr90424-2.C: Likewise. > > Index: gcc/tree-ssa.c > =================================================================== > --- gcc/tree-ssa.c (revision 271155) > +++ gcc/tree-ssa.c (working copy) > @@ -1521,14 +1521,28 @@ non_rewritable_lvalue_p (tree lhs) > if (DECL_P (decl) > && VECTOR_TYPE_P (TREE_TYPE (decl)) > && TYPE_MODE (TREE_TYPE (decl)) != BLKmode > - && operand_equal_p (TYPE_SIZE_UNIT (TREE_TYPE (lhs)), > - TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (decl))), 0) > + && multiple_of_p (sizetype, > + TYPE_SIZE_UNIT (TREE_TYPE (decl)), > + TYPE_SIZE_UNIT (TREE_TYPE (lhs))) > && known_ge (mem_ref_offset (lhs), 0) > && known_gt (wi::to_poly_offset (TYPE_SIZE_UNIT (TREE_TYPE (decl))), > mem_ref_offset (lhs)) > && multiple_of_p (sizetype, TREE_OPERAND (lhs, 1), > TYPE_SIZE_UNIT (TREE_TYPE (lhs)))) > - return false; > + { > + if (operand_equal_p (TYPE_SIZE_UNIT (TREE_TYPE (lhs)), > + TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (decl))), > + 0)) > + return false; > + /* For sub-vector inserts the insert vector mode has to be > + supported. */ > + tree vtype = build_vector_type > + (TREE_TYPE (TREE_TYPE (decl)), > + tree_to_uhwi (TYPE_SIZE (TREE_TYPE (lhs))) > + / tree_to_uhwi (TYPE_SIZE (TREE_TYPE (TREE_TYPE (decl)))));
AFAICT nothing guarantees tree_fits_uhwi_p for the lhs, so this isn't poly-int clean. Is there a guarantee that lhs is a multiple of the element size even for integers? Or are we just relying on a vector of 0 elements being rejected? Maybe something like: tree elt_type = TREE_TYPE (TREE_TYPE (decl)); unsigned int elt_bits = tree_to_uhwi (TYPE_SIZE (elt_type)); poly_uint64 lhs_bits, nelts; if (poly_int_tree_p (TYPE_SIZE (TREE_TYPE (lhs)), &lhs_bits) && multiple_p (lhs_bits, elt_bits, &nelts)) { tree vtype = build_vector_type (elt_type, nelts); ? Thanks, Richard