On 6 May 2016 at 17:20, Richard Biener <[email protected]> wrote: > On Wed, 4 May 2016, Prathamesh Kulkarni wrote: > >> On 23 February 2016 at 21:49, Prathamesh Kulkarni >> <[email protected]> wrote: >> > On 23 February 2016 at 17:31, Richard Biener <[email protected]> wrote: >> >> On Tue, 23 Feb 2016, Prathamesh Kulkarni wrote: >> >> >> >>> On 22 February 2016 at 17:36, Richard Biener <[email protected]> wrote: >> >>> > On Mon, 22 Feb 2016, Prathamesh Kulkarni wrote: >> >>> > >> >>> >> Hi Richard, >> >>> >> As discussed in private mail, this version of patch attempts to >> >>> >> increase alignment >> >>> >> of global struct decl if it contains an an array field(s) and array's >> >>> >> offset is a multiple of the alignment of vector type corresponding to >> >>> >> it's scalar type and recursively checks for nested structs. >> >>> >> eg: >> >>> >> static struct >> >>> >> { >> >>> >> int a, b, c, d; >> >>> >> int k[4]; >> >>> >> float f[10]; >> >>> >> }; >> >>> >> k is a candidate array since it's offset is 16 and alignment of >> >>> >> "vector (4) int" is 8. >> >>> >> Similarly for f. >> >>> >> >> >>> >> I haven't been able to create a test-case where there are >> >>> >> multiple candidate arrays and vector alignment of arrays are >> >>> >> different. >> >>> >> I suppose in this case we will have to increase alignment >> >>> >> of the struct by the max alignment ? >> >>> >> eg: >> >>> >> static struct >> >>> >> { >> >>> >> <fields> >> >>> >> T1 k[S1] >> >>> >> <fields> >> >>> >> T2 f[S2] >> >>> >> <fields> >> >>> >> }; >> >>> >> >> >>> >> if V1 is vector type corresponding to T1, and V2 corresponding vector >> >>> >> type to T2, >> >>> >> offset (k) % align(V1) == 0 and offset (f) % align(V2) == 0 >> >>> >> and align (V1) > align(V2) then we will increase alignment of struct >> >>> >> by align(V1). >> >>> >> >> >>> >> Testing showed FAIL for g++.dg/torture/pr31863.C due to program >> >>> >> timeout. >> >>> >> Initially it appeared to me, it went in infinite loop. However >> >>> >> on second thoughts I think it's probably not an infinite loop, rather >> >>> >> taking (extraordinarily) large amount of time >> >>> >> to compile the test-case with the patch. >> >>> >> The test-case builds quickly for only 2 instantiations of ClassSpec >> >>> >> (ClassSpec<Key, A001, 1>, >> >>> >> ClassSpec<Key, A002, 2>) >> >>> >> Building with 22 instantiations (upto ClassSpec<Key, A0023, 22>) >> >>> >> takes up >> >>> >> to ~1m to compile. >> >>> >> with: >> >>> >> 23 instantiations: ~2m >> >>> >> 24 instantiations: ~5m >> >>> >> For 30 instantiations I terminated cc1plus after 13m (by SIGKILL). >> >>> >> >> >>> >> I guess it shouldn't go in an infinite loop because: >> >>> >> a) structs cannot have circular references. >> >>> >> b) works for lower number of instantiations >> >>> >> However I have no sound evidence that it cannot be in infinite loop. >> >>> >> I don't understand why a decl node is getting visited more than once >> >>> >> for that test-case. >> >>> >> >> >>> >> Using a hash_map to store alignments of decl's so that decl node gets >> >>> >> visited >> >>> >> only once prevents the issue. >> >>> > >> >>> > Maybe aliases. Try not walking vnode->alias == true vars. >> >>> Hi, >> >>> I have a hypothesis why decl node gets visited multiple times. >> >>> >> >>> Consider the test-case: >> >>> template <typename T, unsigned N> >> >>> struct X >> >>> { >> >>> T a; >> >>> virtual int foo() { return N; } >> >>> }; >> >>> >> >>> typedef struct X<int, 1> x_1; >> >>> typedef struct X<int ,2> x_2; >> >>> >> >>> static x_1 obj1 __attribute__((used)); >> >>> static x_2 obj2 __attribute__((used)); >> >>> >> >>> Two additional structs are created by C++FE, c++filt shows: >> >>> _ZTI1XIiLj1EE -> typeinfo for X<int, 1u> >> >>> _ZTI1XIiLj2EE -> typeinfo for X<int, 2u> >> >>> >> >>> Both of these structs have only one field D.2991 and it appears it's >> >>> *shared* between them: >> >>> struct D.2991; >> >>> const void * D.2980; >> >>> const char * D.2981; >> >>> >> >>> Hence the decl node D.2991 and it's fields (D.2890, D.2981) get visited >> >>> twice: >> >>> once when walking _ZTI1XIiLj1EE and 2nd time when walking _ZTI1XIiLj2EE >> >>> >> >>> Dump of walking over the global structs for above test-case: >> >>> http://pastebin.com/R5SABW0c >> >>> >> >>> So it appears to me to me a DAG (interior node == struct decl, leaf == >> >>> non-struct field, >> >>> edge from node1 -> node2 if node2 is field of node1) is getting >> >>> created when struct decl >> >>> is a type-info object. >> >>> >> >>> I am not really clear on how we should proceed: >> >>> a) Keep using hash_map to store alignments so that every decl gets >> >>> visited only once. >> >>> b) Skip walking artificial record decls. >> >>> I am not sure if skipping all artificial struct decls would be a good >> >>> idea, but I don't >> >>> think it's possible to identify if a struct-decl is typeinfo struct at >> >>> middle-end ? >> >> >> >> You shouldn't end up walking those when walking the type of >> >> global decls. That is, don't walk typeinfo decls - yes, practically >> >> that means just not walking DECL_ARTIFICIAL things. >> > Hi, >> > I have done the changes in the patch (attached) and cross-tested >> > on arm*-*-* and aarch64*-*-* without failures. >> > Is it OK for stage-1 ? >> Hi, >> Is the attached patch OK for trunk ? >> Bootstrapped and tested on aarch64-linux-gnu, ppc64le-linux-gnu. >> Cross-tested on arm*-*-* and aarch64*-*-*. > > You can't simply use > > + offset = int_byte_position (field); > > as it can be placed at variable offset which will make int_byte_position > ICE. Note it also returns a truncated byte position (bit position > stripped) which may be undesirable here. I think you want to use > bit_position and if and only if DECL_FIELD_OFFSET and > DECL_FIELD_BIT_OFFSET are INTEGER_CST. oops, I didn't realize offsets could be variable. Will that be the case only for VLA member inside struct ? > > Your observation about the expensiveness of the walk still stands I guess > and eventually you should at least cache the > get_vec_alignment_for_record_decl cases. Please make those workers > _type rather than _decl helpers. Done > > You seem to simply get at the maximum vectorized field/array element > alignment possible for all arrays - you could restrict that to > arrays with at least vector size (as followup). Um sorry, I didn't understand this part. > > + /* Skip artificial decls like typeinfo decls or if > + record is packed. */ > + if (DECL_ARTIFICIAL (record_decl) || TYPE_PACKED (type)) > + return 0; > > I think we should honor DECL_USER_ALIGN as well and not mess with those > decls. Done > > Given the patch now does quite some extra work it might make sense > to split the symtab part out of the vect_can_force_dr_alignment_p > predicate and call that early. In the patch I call symtab_node::can_increase_alignment_p early. I tried moving that to it's callers - vect_compute_data_ref_alignment and increase_alignment::execute, however that failed some tests in vect, and hence I didn't add the following hunk in the patch. Did I miss some check ?
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 7652e21..2c1acee 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -795,7 +795,10 @@ vect_compute_data_ref_alignment (struct data_reference *dr)
&& TREE_CODE (TREE_OPERAND (base, 0)) == ADDR_EXPR)
base = TREE_OPERAND (TREE_OPERAND (base, 0), 0);
- if (!vect_can_force_dr_alignment_p (base, TYPE_ALIGN (vectype)))
+ if (!(TREE_CODE (base) == VAR_DECL
+ && decl_in_symtab_p (base)
+ && symtab_node::get (base)->can_increase_alignment_p ()
+ && vect_can_force_dr_alignment_p (base, TYPE_ALIGN (vectype))))
{
if (dump_enabled_p ())
{
Thanks,
Prathamesh
>
> Richard.
diff --git a/gcc/testsuite/gcc.dg/vect/section-anchors-vect-70.c
b/gcc/testsuite/gcc.dg/vect/section-anchors-vect-70.c
new file mode 100644
index 0000000..7010a52
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/section-anchors-vect-70.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target section_anchors } */
+/* { dg-require-effective-target vect_int } */
+
+#define N 32
+
+/* Increase alignment of struct if an array's offset is multiple of alignment
of
+ vector type corresponding to it's scalar type.
+ For the below test-case:
+ offsetof(e) == 8 bytes.
+ i) For arm: let x = alignment of vector type corresponding to int,
+ x == 8 bytes.
+ Since offsetof(e) % x == 0, set DECL_ALIGN(a, b, c) to x.
+ ii) For aarch64, ppc: x == 16 bytes.
+ Since offsetof(e) % x != 0, don't increase alignment of a, b, c.
+*/
+
+static struct A {
+ int p1, p2;
+ int e[N];
+} a, b, c;
+
+int foo(void)
+{
+ for (int i = 0; i < N; i++)
+ a.e[i] = b.e[i] + c.e[i];
+
+ return a.e[0];
+}
+
+/* { dg-final { scan-ipa-dump-times "Increasing alignment of decl" 0
"increase_alignment" { target aarch64*-*-* } } } */
+/* { dg-final { scan-ipa-dump-times "Increasing alignment of decl" 0
"increase_alignment" { target powerpc64*-*-* } } } */
+/* { dg-final { scan-ipa-dump-times "Increasing alignment of decl" 3
"increase_alignment" { target arm*-*-* } } } */
diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
index 2b25b45..c94bbcb 100644
--- a/gcc/tree-vectorizer.c
+++ b/gcc/tree-vectorizer.c
@@ -794,38 +794,129 @@ make_pass_slp_vectorize (gcc::context *ctxt)
This should involve global alignment analysis and in the future also
array padding. */
+static unsigned get_vec_alignment_for_type (tree);
+static hash_map<tree, unsigned> *type_align_map;
+
+/* Return alignment of array's vector type corresponding to scalar type.
+ 0 if no vector type exists. */
+static unsigned
+get_vec_alignment_for_array_type (tree type)
+{
+ gcc_assert (TREE_CODE (type) == ARRAY_TYPE);
+
+ tree vectype = get_vectype_for_scalar_type (strip_array_types (type));
+ return (vectype) ? TYPE_ALIGN (vectype) : 0;
+}
+
+/* Return alignment of field having maximum alignment of vector type
+ corresponding to it's scalar type. For now, we only consider fields whose
+ offset is a multiple of it's vector alignment.
+ 0 if no suitable field is found. */
+static unsigned
+get_vec_alignment_for_record_type (tree type)
+{
+ gcc_assert (TREE_CODE (type) == RECORD_TYPE);
+ unsigned max_align = 0, alignment;
+ HOST_WIDE_INT offset;
+ tree offset_tree;
+
+ if (TYPE_PACKED (type))
+ return 0;
+
+ for (tree field = first_field (type);
+ field != NULL_TREE;
+ field = DECL_CHAIN (field))
+ {
+ /* Skip if not FIELD_DECL or has variable offset. */
+ if (TREE_CODE (field) != FIELD_DECL
+ || TREE_CODE (DECL_FIELD_OFFSET (field)) != INTEGER_CST
+ || TREE_CODE (DECL_FIELD_BIT_OFFSET (field)) != INTEGER_CST
+ || DECL_USER_ALIGN (field)
+ || DECL_ARTIFICIAL (field))
+ continue;
+
+ offset_tree = bit_position (field);
+ /* FIXME: is this check necessary since we check for variable offset
above ? */
+ if (TREE_CODE (offset_tree) != INTEGER_CST)
+ continue;
+
+ offset = tree_to_uhwi (offset_tree);
+ alignment = get_vec_alignment_for_type (TREE_TYPE (field));
+
+ if (alignment
+ && (offset % alignment == 0)
+ && (alignment > max_align))
+ max_align = alignment;
+ }
+
+ return max_align;
+}
+
+/* Return alignment of vector type corresponding to decl's scalar type
+ or 0 if it doesn't exist or the vector alignment is lesser than
+ decl's alignment. */
+static unsigned
+get_vec_alignment_for_type (tree type)
+{
+ if (type == NULL_TREE)
+ return 0;
+
+ gcc_assert (TYPE_P (type));
+
+ unsigned *slot = type_align_map->get (type);
+ if (slot)
+ return *slot;
+
+ static unsigned alignment = 0;
+ switch (TREE_CODE (type))
+ {
+ case ARRAY_TYPE:
+ alignment = get_vec_alignment_for_array_type (type);
+ break;
+ case RECORD_TYPE:
+ alignment = get_vec_alignment_for_record_type (type);
+ break;
+ default:
+ alignment = 0;
+ break;
+ }
+
+ unsigned ret = (alignment > TYPE_ALIGN (type)) ? alignment : 0;
+ type_align_map->put (type, ret);
+ return ret;
+}
+
+/* Entry point to increase_alignment pass. */
static unsigned int
increase_alignment (void)
{
varpool_node *vnode;
vect_location = UNKNOWN_LOCATION;
+ type_align_map = new hash_map<tree, unsigned>;
/* Increase the alignment of all global arrays for vectorization. */
FOR_EACH_DEFINED_VARIABLE (vnode)
{
- tree vectype, decl = vnode->decl;
- tree t;
+ tree decl = vnode->decl;
unsigned int alignment;
- t = TREE_TYPE (decl);
- if (TREE_CODE (t) != ARRAY_TYPE)
- continue;
- vectype = get_vectype_for_scalar_type (strip_array_types (t));
- if (!vectype)
- continue;
- alignment = TYPE_ALIGN (vectype);
- if (DECL_ALIGN (decl) >= alignment)
- continue;
-
- if (vect_can_force_dr_alignment_p (decl, alignment))
+ if ((decl_in_symtab_p (decl)
+ && !symtab_node::get (decl)->can_increase_alignment_p ())
+ || DECL_USER_ALIGN (decl) || DECL_ARTIFICIAL (decl))
+ continue;
+
+ alignment = get_vec_alignment_for_type (TREE_TYPE (decl));
+ if (alignment && vect_can_force_dr_alignment_p (decl, alignment))
{
- vnode->increase_alignment (TYPE_ALIGN (vectype));
+ vnode->increase_alignment (alignment);
dump_printf (MSG_NOTE, "Increasing alignment of decl: ");
dump_generic_expr (MSG_NOTE, TDF_SLIM, decl);
dump_printf (MSG_NOTE, "\n");
}
}
+
+ delete type_align_map;
return 0;
}
ChangeLog
Description: Binary data
