With a few recent notes by others, I have identified that the comparison logic I used in interface.c (compare_components, gfc_compare_derived_types) was faulty in a few ways. I apologize in advance for the length of this message, but I believe it will help others (and myself) understand (recall) more thoroughly my internal implementation of structures, unions, and maps which may have not be obvious just by staring at hundreds of lines of code. Feel free to skip to the end where you'll find the small patch which reorganizes this section of code for clarity and correctness.
<tmi> The goal of the aforementioned sections of code is to allow gfc_compare_derived_types (and gfc_compare_union_types) to determine when two type definitions are equal. For STRUCTUREs, as with derived types with the SEQUENCE attribute, the type definitions can be considered the same if their declarations are identical. With derived types, this is easy enough. For STRUCTUREs however, we have several constructs which induce the creation of both structure type definitions and components with no user-defined names. I refer to these as anonymous structures and anonymous components. One of the checks originally performed by gfc_compare_derived_types is to compare the names of both type definitions and components. Clearly, this is insufficient for anonymous elements. Consider the following definition of a structure containing a union and an anonymous ("ad-hoc") structure definition: STRUCTURE /s/ integer x UNION MAP integer y END MAP END UNION STRUCTURE sub real r END STUCTURE END STRUCTURE Internally, this has to create four separate type definitions. One for the containing structure ("s"), one for the union, one for the map within the union, and one for the anonymous sub-structure (in the sub-structure, the type itself has no name - "sub" is the name of the component which has the anonymous type). The outer structure, the map, and the anonymous inner structure all get FL_STRUCT gfc_symbols and the union gets an FL_UNION gfc_symbol. Furthermore several gfc_components are created: s contains one for x, one for the union, and one for sub; the type definition of sub contains one for r; the union contains one for the map; and the map contains one for y. Since there is no way for the user to name some of these elements, some components and thier type definitions are "anonymous", and must be given unambiguous internal names. In the case of anonymous structure components, I have chosen to store the second letter of the type definition symbol as uppercase. As with regular derived type definitions, the first letter is also uppercase. In the case of the anonymous union and map components, the component names themselves start with a lowercase letter and the second letter is uppercase. These names are unambiguous because Fortran is case-insensitive and all names from user source are stored as lowercase. As a result, even when the above structure definition appears in multiple places identically in source code, the compiler will generate unique internal names for each union, map, and anonymous structure definition because it cannot know immediately upon parsing whether they match. In the case of derived types, components with different names are clearly distinct. But in the case of structures, to properly consider unions and maps we must go deeper and compare all the sub-components. The same structure above declared in multiple places should be considered equal, as with derived types with the SEQUENCE attribute. This is the reason we need to bypass the name check in gfc_compare_derived_types and compare_components. One issue fixed by this patch is PR 77327, where I greedily started checking properties of the components' derived type symbol, even before I knew it was a derived type. In the following code you can see the unprotected access to cmp1->ts.u.derived before confirming cmp1->ts.type: > compare_components (gfc_component *cmp1, gfc_component *cmp2, (snip) > d1 = cmp1->ts.u.derived; > d2 = cmp2->ts.u.derived; > if ( (d1 && (d1->attr.flavor == FL_STRUCT || d1->attr.flavor == FL_UNION) (snip) The other issue fixed is in the anonymous name checking logic at the top of gfc_compare_derived_types. The bug is exhibited by the attached testcase wherein extra errors/warnings are erroneously generated for identical structure definitions. As I described above, the first and second characters of anonymous structure types are uppercase, and only if the symbol is FL_UNION or FL_STRUCT. Unfortunately, for whatever reason, the current code checks the third character via derived2->name[2]: > gfc_compare_derived_types (gfc_symbol *derived1, gfc_symbol *derived2) (snip) > anonymous = (derived1->name[0] == derived2->name[0] > && derived1->name[1] && derived2->name[1] && derived2->name[2] > && derived1->name[1] == (char) TOUPPER (derived1->name[0]) > && derived2->name[2] == (char) TOUPPER (derived2->name[0])); (snip) With the patch, duplicate structure definitions containing anonymous elements will appropriately be resolved to use the same backend tree declarations. You can see the clarified conditions for n name comparison in the new functions in interface.c is_anonymous_component and is_anonymous_dt. This also helps abstract the condition, if one were to change the implementation of anonymous names in the future. </tmi> tl;dr: The attached patch fixes PR 77327 and another bug which is demonstrated in the included testcase. The replacement code is also easier to understand and more maintainable than the original. Bootstraps and regtests on x86_64-redhat-linux. Since it is a change to my previously submitted code I'll aim to commit to trunk over the weekend if there are no concerns by then. As a side note, I actually don't know how to exhibit PR 77327 in a DejaGNU testcase, since it is a normally-silent heap-use-after-free which was only found with a memory checker. I would appreciate any advice on this matter so I can include a testcase for it with this commit. --- Fritz Reese
diff --git a/gcc/fortran/interface.c b/gcc/fortran/interface.c index 17500c9..ffc36f5 100644 --- a/gcc/fortran/interface.c +++ b/gcc/fortran/interface.c @@ -387,26 +387,46 @@ gfc_match_end_interface (void) } +/* Return whether the component was defined anonymously. */ + +static bool +is_anonymous_component (gfc_component *cmp) +{ + /* Only UNION and MAP components are anonymous. In the case of a MAP, + the derived type symbol is FL_STRUCT and the component name looks like mM*. + This is the only case in which the second character of a component name is + uppercase. */ + return cmp->ts.type == BT_UNION + || (cmp->ts.type == BT_DERIVED + && cmp->ts.u.derived->attr.flavor == FL_STRUCT + && cmp->name[0] && cmp->name[1] && ISUPPER (cmp->name[1])); +} + + +/* Return whether the derived type was defined anonymously. */ + +static bool +is_anonymous_dt (gfc_symbol *derived) +{ + /* UNION and MAP types are always anonymous. Otherwise, only nested STRUCTURE + types can be anonymous. For anonymous MAP/STRUCTURE, we have FL_STRUCT + and the type name looks like XX*. This is the only case in which the + second character of a type name is uppercase. */ + return derived->attr.flavor == FL_UNION + || (derived->attr.flavor == FL_STRUCT + && derived->name[0] && derived->name[1] && ISUPPER (derived->name[1])); +} + + /* Compare components according to 4.4.2 of the Fortran standard. */ static int compare_components (gfc_component *cmp1, gfc_component *cmp2, gfc_symbol *derived1, gfc_symbol *derived2) { - gfc_symbol *d1, *d2; - bool anonymous = false; - - /* Unions, maps, and anonymous structures all have names like "[xX]X$\d+" - which should not be compared. */ - d1 = cmp1->ts.u.derived; - d2 = cmp2->ts.u.derived; - if ( (d1 && (d1->attr.flavor == FL_STRUCT || d1->attr.flavor == FL_UNION) - && ISUPPER (cmp1->name[1])) - || (d2 && (d2->attr.flavor == FL_STRUCT || d2->attr.flavor == FL_UNION) - && ISUPPER (cmp2->name[1]))) - anonymous = true; - - if (!anonymous && strcmp (cmp1->name, cmp2->name) != 0) + /* Compare names, but not for anonymous components such as UNION or MAP. */ + if (!is_anonymous_component (cmp1) && !is_anonymous_component (cmp1) + && strcmp (cmp1->name, cmp2->name) != 0) return 0; if (cmp1->attr.access != cmp2->attr.access) @@ -512,22 +532,12 @@ int gfc_compare_derived_types (gfc_symbol *derived1, gfc_symbol *derived2) { gfc_component *cmp1, *cmp2; - bool anonymous = false; if (derived1 == derived2) return 1; gcc_assert (derived1 && derived2); - /* MAP and anonymous STRUCTURE types have internal names of the form - mM* and sS* (we can get away this this because source names are converted - to lowerase). Compare anonymous type names specially because each - gets a unique name when it is declared. */ - anonymous = (derived1->name[0] == derived2->name[0] - && derived1->name[1] && derived2->name[1] && derived2->name[2] - && derived1->name[1] == (char) TOUPPER (derived1->name[0]) - && derived2->name[2] == (char) TOUPPER (derived2->name[0])); - /* Special case for comparing derived types across namespaces. If the true names and module names are the same and the module name is nonnull, then they are equal. */ @@ -541,7 +551,9 @@ gfc_compare_derived_types (gfc_symbol *derived1, gfc_symbol *derived2) because they can be anonymous; therefore two structures with different names may be equal. */ - if (strcmp (derived1->name, derived2->name) != 0 && !anonymous) + /* Compare names, but not for anonymous types such as UNION or MAP. */ + if (!is_anonymous_dt (derived1) && !is_anonymous_dt (derived2) + && strcmp (derived1->name, derived2->name) != 0) return 0; if (derived1->component_access == ACCESS_PRIVATE diff --git a/gcc/testsuite/gfortran.dg/dec_structure_13.f90 b/gcc/testsuite/gfortran.dg/dec_structure_13.f90 new file mode 100644 index 0000000..209c7da --- /dev/null +++ b/gcc/testsuite/gfortran.dg/dec_structure_13.f90 @@ -0,0 +1,79 @@ +! { dg-do compile } +! { dg-options "-fdec-structure" } +! +! Verify that the comparisons in gfc_compare_derived_types can correctly +! match nested anonymous subtypes. +! + +subroutine sub0 (u) + structure /t/ + structure sub + integer i + end structure + endstructure + record /t/ u + u.sub.i = 0 +end subroutine sub0 + +subroutine sub1 () + structure /t/ + structure sub + integer i + end structure + endstructure + record /t/ u + + interface + subroutine sub0 (u) ! regression: Interface mismatch.*Type mismatch + structure /t/ + structure sub + integer i + end structure + endstructure + record /t/ u + end subroutine + end interface + + call sub0(u) ! regression: Type mismatch in argument +end subroutine + +subroutine sub3(u) + structure /tu/ + union + map + integer i + end map + map + real r + end map + end union + end structure + record /tu/ u + u.r = 1.0 +end subroutine + +structure /t/ + structure sub + integer i + end structure +endstructure + +structure /tu/ + union + map + integer i + end map + map + real r + end map + end union +end structure + +record /t/ u +record /tu/ u2 + +call sub0(u) ! regression: Type mismatch in argument +call sub1() +call sub2(u2) + +end