On Fri, Sep 25, 2020 at 2:27 PM Erick Ochoa <erick.oc...@theobroma-systems.com> wrote: > > > > On 25/09/2020 13:30, Richard Biener wrote: > > On Fri, Sep 25, 2020 at 9:05 AM Erick Ochoa > > <erick.oc...@theobroma-systems.com> wrote: > >> > >> Hi, > >> > >> I am working on an alias analysis using the points-to information > >> generated during IPA-PTA. If we look at the varmap varinfo_t array in > >> gcc/tree-ssa-struct.c, most of the constraint variable info structs > >> contain a non-null decl field which points to a valid tree in gimple > >> (which is an SSA variable and a pointer). I am trying to find out a way > >> to obtain points-to information for pointer expressions. By this, the > >> concrete example I have in mind is answering the following question: > >> > >> What does `astruct->aptrfield` points to? > >> > >> Here I have a concrete example: > >> > >> > >> #include <stdlib.h> > >> > >> struct A { char *f1; struct A *f2;}; > >> > >> int __GIMPLE(startwith("ipa-pta")) > >> main (int argc, char * * argv) > >> { > >> struct A * p1; > >> char * pc; > >> int i; > >> int _27; > >> > >> i_15 = 1; > >> pc = malloc(100); // HEAP(1) > >> p1 = malloc (16); // HEAP(2) > >> p1->f1 = pc; > >> p1->f2 = p1; > >> _27 = (int) 0; > >> return _27; > >> } > >> > >> > >> Will give the following correct points-to information: > >> > >> HEAP(1) = { } > >> HEAP(2) = { HEAP(1) HEAP(2) } > >> pc_30 = { HEAP(1) } > >> p1_32 = { HEAP(2) } > >> > >> However, there does not seem to be information printed for neither: > >> > >> p1->f1 > >> p1->f2 > >> > >> which I would expect (or like) something like: > >> > >> p1_32->0 = { HEAP(1) } > >> p1_32->64 = { HEAP(2) } > >> > >> Looking more closely at the problem, I found that some varinfo_t have a > >> non-null "complex" field. Which has an array of "complex" constraints > >> used to handle offsets and dereferences in gimple. For this same gimple > >> code, we have the following complex constraints for the variable p1_32: > >> > >> main.clobber = p1_32 + 64 > >> *p1_32 = pc_30 > >> *p1_32 + 64 = p1_32 > > > > The issue is that allocated storage is not tracked field-sensitive since > > we do not know it's layout at the point of allocation (where we allocate > > the HEAP variable). There are some exceptions, see what we do > > for by-reference parameters in create_variable_info_for_1: > > > > if (vi->only_restrict_pointers > > && !type_contains_placeholder_p (TREE_TYPE (decl_type)) > > && handle_param > > && !bitmap_bit_p (handled_struct_type, > > TYPE_UID (TREE_TYPE (decl_type)))) > > { > > varinfo_t rvi; > > tree heapvar = build_fake_var_decl (TREE_TYPE (decl_type)); > > DECL_EXTERNAL (heapvar) = 1; > > if (var_can_have_subvars (heapvar)) > > bitmap_set_bit (handled_struct_type, > > TYPE_UID (TREE_TYPE (decl_type))); > > rvi = create_variable_info_for_1 (heapvar, "PARM_NOALIAS", true, > > true, handled_struct_type); > > if (var_can_have_subvars (heapvar)) > > bitmap_clear_bit (handled_struct_type, > > TYPE_UID (TREE_TYPE (decl_type))); > > rvi->is_restrict_var = 1; > > insert_vi_for_tree (heapvar, rvi); > > make_constraint_from (vi, rvi->id); > > make_param_constraints (rvi); > > > > where we create a heapvarwith a specific aggregate type. Generally > > make_heapvar (for the allocation case) allocates a variable without > > subfields: > > > > static varinfo_t > > make_heapvar (const char *name, bool add_id) > > { > > varinfo_t vi; > > tree heapvar; > > > > heapvar = build_fake_var_decl (ptr_type_node); > > DECL_EXTERNAL (heapvar) = 1; > > > > vi = new_var_info (heapvar, name, add_id); > > vi->is_heap_var = true; > > vi->is_unknown_size_var = true; > > vi->offset = 0; > > vi->fullsize = ~0; > > vi->size = ~0; > > vi->is_full_var = true; > > > > I've once had attempted to split (aka generate subfields) a variable > > on-demand during solving but that never worked well. > > > > So for specific cases like C++ new T we could create heapvars > > appropriately typed. But you have to double-check for correctness > > because of may_have_pointers and so on. > > Hi Richard, I am not 100% sure about the meaning of the > may_have_pointers field. In the source it says: > > 270 /* True if this field may contain pointers. */ > 271 unsigned int may_have_pointers : 1; > > Does that mean that (1) the field is a pointer type? (2) An integer > which may have hold the value of a pointer?
Yes. > Or that (3) there are > pointers pointing to that field? No, that property doesn't exist as flag. > You mention that for C++ new T heap vars are typed appropriately, but we > still need to check for correctness. If I understood correctly, this > means that may_have_pointers means that there might be pointers pointing > to that variable (?) Because the pointers could potentially be casted > and therefore the layout be interpreted in a different way. This is just > my interpretation of what you said, can you please confirm or clarify > that bit? No, C++ new T heap vars are the same as for malloc. I said it is probably easier to derive a good guess what type to use there. And yes, we'd have to still force all fields to have may_have_pointers to true because it's really just arbitrary sub-dividing of storage without any type info. > > Another clarification point: you mentioned that the heap variables' > layout needs to be known in order to do what I want. I think you are > right, but let me continue the example. So we at least need to know the size for most of the machinery to work. > These are the points-to sets we have: > > HEAP(1) = { } > HEAP(2) = { HEAP(1) HEAP(2) } > pc_30 = { HEAP(1) } > p1_32 = { HEAP(2) } > > These are the complex constraints of interest: > > *p1_32 = pc_30 > *p1_32 + 64 = p1_32 > > This is what we want: > > p1_32->0 = { HEAP(1) } > p1_32->64 = { HEAP(2) } > > If I were to continue working on say "parsing the constraint variables > to generate something like what I want", it would mean that I would have > to parse all possible complex constraint variables. Which would include > the following type of statements: > > *x = y + off_y > > x corresponds to a constraint that has a valid gimple SSA variable > y is a heap variable > > And therefore, since y does not have a layout known, in these cases *x > can point to whatever y points to right? Because even though off_y might > be known, since the layout of y is unknown this basically would mean > that we are forced to treat off_y as UNKNOWN_OFFSET. yes > So, the only cases which I would be able to correctly determine are > those in which there is no heap variable either in the lhs or rhs > (inclusive) of the complex constraint. Which I think implies that there > are *some* cases which can be statically determined, like the one > above... (There might be some complications which I haven't thought of > of course...) well, usually the fields are accessed via x->a and x->b so the offsets are constant and not unknown. > > Now this is just some random thought and maybe does not warrant a > concrete response below but I will be thinking about this during the > weekend: > > I'm wondering whether it would be possible to keep HEAP variables > without a concrete layout but still "unify" all aliases that point to > that heap variable to at least have some idea of what the layout looks > like. For example: > > struct A { char *f0; struct A* f1; }; > struct A *p0 = malloc(argc * sizeof(struct A)); // HEAP(0) > > We don't know the size of malloc... but if we find all pointers which > point to HEAP(0) we could say something like > > HEAP(0) is an array with size unknown and we can accesses the following: > 8 * n bytes for a struct A* type. No need to keep track of individual > array elements accessed, just knowing that if its a multiple of 8 bytes, > it will have a layout of struct A*. not sure what you're after, but the current points-to solver does not track offsets, it just has an approximation of offsets via splitting pointed-to variables into pieces. So if p points to { s@0+32 } (subfield of s with extent 0 to 32) we know it points _somewhere_ inside that region. If you then have a constraint like q = p + 4 then q will point to { s@0+32 s@32+32 } if there exists an adjacent field. A following r = q - 4 will then add another field left of s@0+32 (ok bad example, negative offset). So the field-sensitivity will bleed out quickly and once you see a UNKNOWN_OFFSET every subfield will be replaced by all subfields. > Or instead of doing an analysis to unify all alias sets, just perform a > speculative layout and disable it if something else which may point to > that location has a different type or performs some UNKNOWN_OFFSET > pointer arithmetic... (But I digress, I have to read a bit more on > pointer analyses to find out more...) > > Thanks (also, I know I digressed a little in this last point, but please > clarify the first two points! This has been very helpful) > > > > >> It seems to me that I can probably parse these complex constraints to > >> generate the answers which I want. Is this the way this is currently > >> being handled in GCC or is there some other standard mechanism for this? > > > > GCC is in the end only interested in points-to sets for SSA names > > which never have subfields. The missing subfields for aggregates > > simply make the points-to solution less precise. > > > > Richard. > > > >> Thanks!