On Fri, Sep 25, 2020 at 9:05 AM Erick Ochoa
<erick.oc...@theobroma-systems.com> wrote:
>
> Hi,
>
> I am working on an alias analysis using the points-to information
> generated during IPA-PTA. If we look at the varmap varinfo_t array in
> gcc/tree-ssa-struct.c, most of the constraint variable info structs
> contain a non-null decl field which points to a valid tree in gimple
> (which is an SSA variable and a pointer). I am trying to find out a way
> to obtain points-to information for pointer expressions. By this, the
> concrete example I have in mind is answering the following question:
>
> What does `astruct->aptrfield` points to?
>
> Here I have a concrete example:
>
>
> #include <stdlib.h>
>
> struct A { char *f1; struct A *f2;};
>
> int __GIMPLE(startwith("ipa-pta"))
> main (int argc, char * * argv)
> {
>    struct A * p1;
>    char * pc;
>    int i;
>    int _27;
>
>    i_15 = 1;
>    pc = malloc(100); // HEAP(1)
>    p1 = malloc (16); // HEAP(2)
>    p1->f1 = pc;
>    p1->f2 = p1;
>    _27 = (int) 0;
>    return _27;
> }
>
>
> Will give the following correct points-to information:
>
> HEAP(1) = { }
> HEAP(2) = { HEAP(1) HEAP(2) }
> pc_30 = { HEAP(1) }
> p1_32 = { HEAP(2) }
>
> However, there does not seem to be information printed for neither:
>
>    p1->f1
>    p1->f2
>
> which I would expect (or like) something like:
>
>    p1_32->0 = { HEAP(1) }
>    p1_32->64 = { HEAP(2) }
>
> Looking more closely at the problem, I found that some varinfo_t have a
> non-null "complex" field. Which has an array of "complex" constraints
> used to handle offsets and dereferences in gimple. For this same gimple
> code, we have the following complex constraints for the variable p1_32:
>
> main.clobber = p1_32 + 64
> *p1_32 = pc_30
> *p1_32 + 64 = p1_32

The issue is that allocated storage is not tracked field-sensitive since
we do not know it's layout at the point of allocation (where we allocate
the HEAP variable).  There are some exceptions, see what we do
for by-reference parameters in create_variable_info_for_1:

      if (vi->only_restrict_pointers
          && !type_contains_placeholder_p (TREE_TYPE (decl_type))
          && handle_param
          && !bitmap_bit_p (handled_struct_type,
                            TYPE_UID (TREE_TYPE (decl_type))))
        {
          varinfo_t rvi;
          tree heapvar = build_fake_var_decl (TREE_TYPE (decl_type));
          DECL_EXTERNAL (heapvar) = 1;
          if (var_can_have_subvars (heapvar))
            bitmap_set_bit (handled_struct_type,
                            TYPE_UID (TREE_TYPE (decl_type)));
          rvi = create_variable_info_for_1 (heapvar, "PARM_NOALIAS", true,
                                            true, handled_struct_type);
          if (var_can_have_subvars (heapvar))
            bitmap_clear_bit (handled_struct_type,
                              TYPE_UID (TREE_TYPE (decl_type)));
          rvi->is_restrict_var = 1;
          insert_vi_for_tree (heapvar, rvi);
          make_constraint_from (vi, rvi->id);
          make_param_constraints (rvi);

where we create a heapvarwith a specific aggregate type.  Generally
make_heapvar (for the allocation case) allocates a variable without
subfields:

static varinfo_t
make_heapvar (const char *name, bool add_id)
{
  varinfo_t vi;
  tree heapvar;

  heapvar = build_fake_var_decl (ptr_type_node);
  DECL_EXTERNAL (heapvar) = 1;

  vi = new_var_info (heapvar, name, add_id);
  vi->is_heap_var = true;
  vi->is_unknown_size_var = true;
  vi->offset = 0;
  vi->fullsize = ~0;
  vi->size = ~0;
  vi->is_full_var = true;

I've once had attempted to split (aka generate subfields) a variable
on-demand during solving but that never worked well.

So for specific cases like C++ new T we could create heapvars
appropriately typed.  But you have to double-check for correctness
because of may_have_pointers and so on.

> It seems to me that I can probably parse these complex constraints to
> generate the answers which I want. Is this the way this is currently
> being handled in GCC or is there some other standard mechanism for this?

GCC is in the end only interested in points-to sets for SSA names
which never have subfields.  The missing subfields for aggregates
simply make the points-to solution less precise.

Richard.

> Thanks!

Reply via email to