Re: On IPA-PTA field sensitivity and pointer expressions

Erick Ochoa Fri, 25 Sep 2020 05:28:08 -0700



On 25/09/2020 13:30, Richard Biener wrote:

On Fri, Sep 25, 2020 at 9:05 AM Erick Ochoa
<erick.oc...@theobroma-systems.com> wrote:


Hi,

I am working on an alias analysis using the points-to information
generated during IPA-PTA. If we look at the varmap varinfo_t array in
gcc/tree-ssa-struct.c, most of the constraint variable info structs
contain a non-null decl field which points to a valid tree in gimple
(which is an SSA variable and a pointer). I am trying to find out a way
to obtain points-to information for pointer expressions. By this, the
concrete example I have in mind is answering the following question:

What does `astruct->aptrfield` points to?

Here I have a concrete example:


#include <stdlib.h>

struct A { char *f1; struct A *f2;};

int __GIMPLE(startwith("ipa-pta"))
main (int argc, char * * argv)
{
    struct A * p1;
    char * pc;
    int i;
    int _27;

    i_15 = 1;
    pc = malloc(100); // HEAP(1)
    p1 = malloc (16); // HEAP(2)
    p1->f1 = pc;
    p1->f2 = p1;
    _27 = (int) 0;
    return _27;
}


Will give the following correct points-to information:

HEAP(1) = { }
HEAP(2) = { HEAP(1) HEAP(2) }
pc_30 = { HEAP(1) }
p1_32 = { HEAP(2) }

However, there does not seem to be information printed for neither:

    p1->f1
    p1->f2

which I would expect (or like) something like:

    p1_32->0 = { HEAP(1) }
    p1_32->64 = { HEAP(2) }

Looking more closely at the problem, I found that some varinfo_t have a
non-null "complex" field. Which has an array of "complex" constraints
used to handle offsets and dereferences in gimple. For this same gimple
code, we have the following complex constraints for the variable p1_32:

main.clobber = p1_32 + 64
*p1_32 = pc_30
*p1_32 + 64 = p1_32


The issue is that allocated storage is not tracked field-sensitive since
we do not know it's layout at the point of allocation (where we allocate
the HEAP variable).  There are some exceptions, see what we do
for by-reference parameters in create_variable_info_for_1:

       if (vi->only_restrict_pointers
           && !type_contains_placeholder_p (TREE_TYPE (decl_type))
           && handle_param
           && !bitmap_bit_p (handled_struct_type,
                             TYPE_UID (TREE_TYPE (decl_type))))
         {
           varinfo_t rvi;
           tree heapvar = build_fake_var_decl (TREE_TYPE (decl_type));
           DECL_EXTERNAL (heapvar) = 1;
           if (var_can_have_subvars (heapvar))
             bitmap_set_bit (handled_struct_type,
                             TYPE_UID (TREE_TYPE (decl_type)));
           rvi = create_variable_info_for_1 (heapvar, "PARM_NOALIAS", true,
                                             true, handled_struct_type);
           if (var_can_have_subvars (heapvar))
             bitmap_clear_bit (handled_struct_type,
                               TYPE_UID (TREE_TYPE (decl_type)));
           rvi->is_restrict_var = 1;
           insert_vi_for_tree (heapvar, rvi);
           make_constraint_from (vi, rvi->id);
           make_param_constraints (rvi);

where we create a heapvarwith a specific aggregate type.  Generally
make_heapvar (for the allocation case) allocates a variable without
subfields:

static varinfo_t
make_heapvar (const char *name, bool add_id)
{
   varinfo_t vi;
   tree heapvar;

   heapvar = build_fake_var_decl (ptr_type_node);
   DECL_EXTERNAL (heapvar) = 1;

   vi = new_var_info (heapvar, name, add_id);
   vi->is_heap_var = true;
   vi->is_unknown_size_var = true;
   vi->offset = 0;
   vi->fullsize = ~0;
   vi->size = ~0;
   vi->is_full_var = true;

I've once had attempted to split (aka generate subfields) a variable
on-demand during solving but that never worked well.

So for specific cases like C++ new T we could create heapvars
appropriately typed.  But you have to double-check for correctness
because of may_have_pointers and so on.

Hi Richard, I am not 100% sure about the meaning of themay_have_pointers field. In the source it says:


  270   /* True if this field may contain pointers.  */
  271   unsigned int may_have_pointers : 1;

Does that mean that (1) the field is a pointer type? (2) An integerwhich may have hold the value of a pointer? Or that (3) there arepointers pointing to that field?

You mention that for C++ new T heap vars are typed appropriately, but westill need to check for correctness. If I understood correctly, thismeans that may_have_pointers means that there might be pointers pointingto that variable (?) Because the pointers could potentially be castedand therefore the layout be interpreted in a different way. This is justmy interpretation of what you said, can you please confirm or clarifythat bit?

Another clarification point: you mentioned that the heap variables'layout needs to be known in order to do what I want. I think you areright, but let me continue the example.


These are the points-to sets we have:

  HEAP(1) = { }
  HEAP(2) = { HEAP(1) HEAP(2) }
  pc_30 = { HEAP(1) }
  p1_32 = { HEAP(2) }

These are the complex constraints of interest:

  *p1_32 = pc_30
  *p1_32 + 64 = p1_32

This is what we want:

   p1_32->0 = { HEAP(1) }
   p1_32->64 = { HEAP(2) }

If I were to continue working on say "parsing the constraint variablesto generate something like what I want", it would mean that I would haveto parse all possible complex constraint variables. Which would includethe following type of statements:


 *x = y + off_y

x corresponds to a constraint that has a valid gimple SSA variable
y is a heap variable

And therefore, since y does not have a layout known, in these cases *xcan point to whatever y points to right? Because even though off_y mightbe known, since the layout of y is unknown this basically would meanthat we are forced to treat off_y as UNKNOWN_OFFSET.

So, the only cases which I would be able to correctly determine arethose in which there is no heap variable either in the lhs or rhs(inclusive) of the complex constraint. Which I think implies that thereare *some* cases which can be statically determined, like the oneabove... (There might be some complications which I haven't thought ofof course...)

Now this is just some random thought and maybe does not warrant aconcrete response below but I will be thinking about this during theweekend:

I'm wondering whether it would be possible to keep HEAP variableswithout a concrete layout but still "unify" all aliases that point tothat heap variable to at least have some idea of what the layout lookslike. For example:


struct A { char *f0; struct A* f1; };
struct A *p0 = malloc(argc * sizeof(struct A)); // HEAP(0)

We don't know the size of malloc... but if we find all pointers whichpoint to HEAP(0) we could say something like


HEAP(0) is an array with size unknown and we can accesses the following:

8 * n bytes for a struct A* type. No need to keep track of individualarray elements accessed, just knowing that if its a multiple of 8 bytes,it will have a layout of struct A*.

Or instead of doing an analysis to unify all alias sets, just perform aspeculative layout and disable it if something else which may point tothat location has a different type or performs some UNKNOWN_OFFSETpointer arithmetic... (But I digress, I have to read a bit more onpointer analyses to find out more...)

Thanks (also, I know I digressed a little in this last point, but pleaseclarify the first two points! This has been very helpful)

It seems to me that I can probably parse these complex constraints to
generate the answers which I want. Is this the way this is currently
being handled in GCC or is there some other standard mechanism for this?


GCC is in the end only interested in points-to sets for SSA names
which never have subfields.  The missing subfields for aggregates
simply make the points-to solution less precise.

Richard.

Thanks!

Re: On IPA-PTA field sensitivity and pointer expressions

Reply via email to