On Mon, 25 Nov 2024, Martin Uecker wrote:
>
> Hi Richard,
>
> here is another version. This now just ignores the size for all trailing
> arrays which I think is the right thing to do. It also modifies the lto
> hashing which also seems to work (but needs more testing and I haven't
> added tests to the patch yet).
>
> I also added back the missing comparison for the field offset.
>
> Instead of checking for flexible arrays members, for the exceptions
> I now simply use RECORD_OR_UNION_TYPE_P. The alternative would be to
> add the bit, but this would then be for any nested trailing array
> - not FAM.
>
> But I am not entirely sure adding the bit is worth it, as checking
> for structures seems to be good enough - maybe for "verify_type"
> we could then spend more time and check recursively for a trailing
> array.
>
> Before I continue with this, what do you think?
>
> Comments and tests still need some improvements.
>
> Martin
>
>
>
> Fix type compatibility for types with flexible array member
> [PR113688,PR114014,PR117724]
>
> verify_type checks the compatibility of TYPE_CANONICAL using
> gimple_canonical_types_compatible_p. But it is stricter than what the
> C standard requires and therefor inconsistent with how TYPE_CANONICAL is
> set
> in the C FE. Here, the logic is changed to ignore the array size when one
> is the last element of a structure or union. To not get errors because of
> an inconsistent number of members, zero-sized arrays are not ignored
> anymore
> when checking the fields of a struct (which is stricter than what was done
> before). Finally, exceptions are added that allow the TYPE_MODE of a type
> with an array as last member to differ from another compatible type.
>
> PR c/113688
> PR c/114014
> PR c/117724
>
> gcc/ChangeLog:
> * tree.cc (gimple_canonical_types_compatible_p): Revise
> logic for types with array as last member.
> (verify_type): Add exceptions.
>
> gcc/lto/ChangeLog:
> * lto-common.cc (hash_canonical_type): Add exceptions.
>
> gcc/testsuite/ChangeLog:
> * gcc.dg/pr113688.c: New test.
> * gcc.dg/pr114014.c: New test.
> * gcc.dg/pr117724.c: New test.
>
> diff --git a/gcc/lto/lto-common.cc b/gcc/lto/lto-common.cc
> index 86a309f92b4..f65a9d1c7b6 100644
> --- a/gcc/lto/lto-common.cc
> +++ b/gcc/lto/lto-common.cc
> @@ -254,7 +254,8 @@ hash_canonical_type (tree type)
> checked. */
> code = tree_code_for_canonical_type_merging (TREE_CODE (type));
> hstate.add_int (code);
> - hstate.add_int (TYPE_MODE (type));
> + if (!RECORD_OR_UNION_TYPE_P (type))
> + hstate.add_int (TYPE_MODE (type));
>
> /* Incorporate common features of numerical types. */
> if (INTEGRAL_TYPE_P (type)
> @@ -332,7 +333,11 @@ hash_canonical_type (tree type)
> && (! DECL_SIZE (f)
> || ! integer_zerop (DECL_SIZE (f))))
> {
> - iterative_hash_canonical_type (TREE_TYPE (f), hstate);
> + tree t = TREE_TYPE (f);
> + if (!TREE_CHAIN (f)
> + && TREE_CODE (t) == ARRAY_TYPE)
> + t = TREE_TYPE (t);
I think this is fine, but it warrants a comment on why we do this.
> + iterative_hash_canonical_type (t, hstate);
> nf++;
> }
>
> diff --git a/gcc/testsuite/gcc.dg/pr113688.c b/gcc/testsuite/gcc.dg/pr113688.c
> new file mode 100644
> index 00000000000..8dee8c86f1b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr113688.c
> @@ -0,0 +1,8 @@
> +/* { dg-do compile } */
> +/* { dg-options "-g" } */
> +
> +struct S{int x,y[1];}*a;
> +int main(void){
> + struct S{int x,y[];};
> +}
> +
> diff --git a/gcc/testsuite/gcc.dg/pr114014.c b/gcc/testsuite/gcc.dg/pr114014.c
> new file mode 100644
> index 00000000000..ab783f4f85d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr114014.c
> @@ -0,0 +1,14 @@
> +/* PR c/114014
> + * { dg-do compile }
> + * { dg-options "-std=c23 -g" } */
> +
> +struct r {
> + int a;
> + char b[];
> +};
> +struct r {
> + int a;
> + char b[0];
> +}; /* { dg-error "redefinition" } */
> +
> +
> diff --git a/gcc/testsuite/gcc.dg/pr117724.c b/gcc/testsuite/gcc.dg/pr117724.c
> new file mode 100644
> index 00000000000..d631daeb644
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr117724.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-g" } */
> +
> +struct {
> + unsigned long len;
> + unsigned long size;
> + char data[];
> +}; /* { dg-warning "unnamed struct" } */
> +struct {
> + struct {
> + unsigned long len;
> + unsigned long size;
> + char data[6];
> + };
> +}; /* { dg-warning "unnamed struct" } */
> +
> diff --git a/gcc/tree.cc b/gcc/tree.cc
> index 125f38b1cfa..30e3b4531a1 100644
> --- a/gcc/tree.cc
> +++ b/gcc/tree.cc
> @@ -13907,8 +13907,12 @@ gimple_canonical_types_compatible_p (const_tree t1,
> const_tree t2,
> || TREE_CODE (t1) == NULLPTR_TYPE)
> return true;
>
> - /* Can't be the same type if they have different mode. */
> - if (TYPE_MODE (t1) != TYPE_MODE (t2))
> + /* Can't be compatible types if they have different mode. We allow
> + mismatching modes for structures or unions because they could end
> + with flexible array member. */
> + if (!RECORD_OR_UNION_TYPE_P (t1)
> + && !RECORD_OR_UNION_TYPE_P (t2)
> + && (TYPE_MODE (t1) != TYPE_MODE (t2)))
> return false;
I think this is OK. There's also the point that the mode of a
RECORD_OR_UNION_TYPE_P is really only used for RTL expansion purposes.
You might argue checking !RECORD_OR_UNION_TYPE_P (t1) is enough,
duplicating the check into the relevant switch cases (and omitting
from record/union type) might be clearer. The mode check was supposed
to be a cheap early test. I'd go with just checking
!RECORD_OR_UNION_TYPE_P on t1 I guess.
I'll note that for ARRAY_TYPE the mode is similarly "irrelevant".
> /* Non-aggregate types can be handled cheaply. */
> @@ -13959,7 +13963,7 @@ gimple_canonical_types_compatible_p (const_tree t1,
> const_tree t2,
> {
> case ARRAY_TYPE:
> /* Array types are the same if the element types are the same and
> - the number of elements are the same. */
> + minimum and maximum index are the same. */
> if (!gimple_canonical_types_compatible_p (TREE_TYPE (t1), TREE_TYPE
> (t2),
> trust_type_canonical)
> || TYPE_STRING_FLAG (t1) != TYPE_STRING_FLAG (t2)
> @@ -14053,23 +14057,35 @@ gimple_canonical_types_compatible_p (const_tree t1,
> const_tree t2,
> f1 || f2;
> f1 = TREE_CHAIN (f1), f2 = TREE_CHAIN (f2))
> {
> - /* Skip non-fields and zero-sized fields. */
> - while (f1 && (TREE_CODE (f1) != FIELD_DECL
> - || (DECL_SIZE (f1)
> - && integer_zerop (DECL_SIZE (f1)))))
> + /* Skip non-fields. We used to skip zero-sized fields, but not
> + allow them only at the end. */
I think this now makes
int i : 32;
int : 0;
int j;
and
int i;
int j;
incompatible? Did you remove this because of actual issues or
because you thought it shouldn't matter? The underlying idea
is of course that a zero-sized field can be ignored and we
want to conservatively treat structs which only differ in
added zero-size fields as compatible (with LTO, for cross-language
TBAA inter-operability). So the bitfield case might not be
the only relevant thing.
> + while (f1 && (TREE_CODE (f1) != FIELD_DECL))
> f1 = TREE_CHAIN (f1);
> - while (f2 && (TREE_CODE (f2) != FIELD_DECL
> - || (DECL_SIZE (f2)
> - && integer_zerop (DECL_SIZE (f2)))))
> + while (f2 && (TREE_CODE (f2) != FIELD_DECL))
> f2 = TREE_CHAIN (f2);
> if (!f1 || !f2)
> break;
So previously struct { int n; } and struct { int n; int data[]; } were
not compatible by this check. I guess also a trailing int : 0 made
structs incompatible.
I think we want to continue to ignore intermediate zero-size fields
but change the loops to
while (f1 && (TREE_CODE (f1) != FIELD_DECL
|| (DECL_SIZE (f1)
&& integer_zerop (DECL_SIZE (f1))
&& TREE_CHAIN (f1))))
so we preserve the last field even when zero-size? It's a bit odd
we treat incomplete (!DECL_SIZE) fields specially, but those should
only appear as last field, maybe || (TREE_CHAIN (f1) && integer_zerop
(DECL_SIZE (f1)) should work to avoid ICEing for those. But
eventually we get into this function for not laid out types where
the assumption might not hold ...
> +
> + tree t1 = TREE_TYPE (f1);
> + tree t2 = TREE_TYPE (f2);
> +
> + /* Special case for array members at the end. */
.. because?
> + if (TREE_CHAIN (f1) == NULL_TREE
> + && TREE_CHAIN (f2) == NULL_TREE
> + && TREE_CODE (t1) == ARRAY_TYPE
> + && TREE_CODE (t2) == ARRAY_TYPE
> + && TYPE_REVERSE_STORAGE_ORDER (t1) ==
> TYPE_REVERSE_STORAGE_ORDER (t2)
> + && TYPE_NONALIASED_COMPONENT (t1) == TYPE_NONALIASED_COMPONENT
> (t2)
recursing on ARRAY_TYPE would have compared TYPE_STRING_FLAG
> + && gimple_compare_field_offset (f1, f2)
> + && gimple_canonical_types_compatible_p
> + (TREE_TYPE (t1), TREE_TYPE (t2),
> + trust_type_canonical))
> + ;
> /* The fields must have the same name, offset and type. */
> - if (DECL_NONADDRESSABLE_P (f1) != DECL_NONADDRESSABLE_P (f2)
> + else if (DECL_NONADDRESSABLE_P (f1) != DECL_NONADDRESSABLE_P (f2)
any reason you elide DECL_NONADDRESSABLE_P comparing above? I think
this is semantically relevant (only used from Ada).
Given you basically only elide comparing of TYPE_DOMAIN I wonder if
we shouldn't add another parameter to gimple_canonical_types_compatible_p
indicating whether to disregard that? Or split out a helper function
to compare the array-type-but-not-domain bits to make those the same
between recursing or open-coding.
I'd suggest to do
/* The fields must have the same offset. */
if (DECL_NONADDRESSABLE_P (f1) != DECL_NONADDRESSABLE_P (f2)
|| !gimple_compare_field_offset (f1, f2))
return false;
upfront to isolate the sole difference on trailing ARRAY_TYPE
handling better.
I think the new mode handling makes sense in general, the LTO bits
should work fine. The zero-size field ignoring stuff may have
too many unrelated changes. So definitely better now.
Thanks,
Richard.
> || !gimple_compare_field_offset (f1, f2)
> || !gimple_canonical_types_compatible_p
> - (TREE_TYPE (f1), TREE_TYPE (f2),
> - trust_type_canonical))
> + (t1, t2, trust_type_canonical))
> return false;
> }
>
> @@ -14213,6 +14229,10 @@ verify_type (const_tree t)
> }
>
> if (COMPLETE_TYPE_P (t) && TYPE_CANONICAL (t)
> + /* We allow a mismatch for structure or union because of
> + flexible array members. */
> + && !RECORD_OR_UNION_TYPE_P (t)
> + && !RECORD_OR_UNION_TYPE_P (TYPE_CANONICAL (t))
> && TYPE_MODE (t) != TYPE_MODE (TYPE_CANONICAL (t)))
> {
> error ("%<TYPE_MODE%> of %<TYPE_CANONICAL%> is not compatible");
>
>
--
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)