Re: [PATCH] doc: clarify the situation with pointer arithmetic

Uecker, Martin Tue, 28 Jan 2020 04:06:54 -0800

Am Dienstag, den 28.01.2020, 10:20 +0300 schrieb Alexander Monakov:
> On Tue, 28 Jan 2020, Uecker, Martin wrote:
> 
> > > (*) this also shows the level of "obfuscation" needed to fool compilers
> > > to lose provenance knowledge is hard to predict.
> > 
> > Well, this is exactly the problem we want to address by defining
> > a clear way to do this. Casting to an integer would be the way
> > to state: "consider the pointer as escaped and forget the 
> > provenance"  and casting an integer to a  pointer would
> > mean "this pointer may point to all objects whose pointer has
> > escaped". This would give the programmer explicit control about
> > this aspect and make most existing code using pointer-to-integer
> > casts well-defined. At the same time, this should be simple
> > to add to existing points-to analysis.
> 
> Can you explain why you make it required for the compiler to treat the
> points-to set unnecessarily broader than it could prove? In the Matlab
> example, there's a simple chain of computations that the compiler is
> following to prove that the pointer resulting from the final cast is
> derived from exactly one other pointer (no other pointers have
> participated in the computations).
> 
> Or, in other words:
> 
> is there an example where a programmer can distinguish between the
> requirement you explain above vs. the fine-grained interpretation
> that GIMPLE aims to implement (at least as I understand it), which is:
> 
>   when the program creates a pointer by means of non-pointer computations
>   (casts, representation access, etc), the resulting pointer may point to:
> 
>     * any object which address could have participated in the computation
>       (which is at worst the entire set of "exposed" objects up to that
>        program point, but can be much narrower if the compiler can see
>        the entire chain of computations)
> 
>     * any objects which is not "exposed" but could have known address, e.g.
>       because it is placed at a specific address during linking


Unfortunately, this is not as simple. It is not trivial to
define the set of objects whose "address could have participated
in the computation"

int a = ... random number
int b = &y;
if (a == b) {
  int *p = (int*)a; 

Did '&y' participate in the computation?

What if you output and integer using I/O and read it back in?

What if you copy an integer using control flow?

There are many similar questions like this.

If we want to make this part of the standard, we need to formulate
rules for all integer operations about how the provenance flows.


There are several problems with this. A compiler needs to be able
to compute the complete points-to set. If it might miss an object
which is allowed to be used it has to be conservative with
aliasing - the analysis become useless. 

So we always have the trade-off between making the rules simpler
and restrict the tracking or specify more complicated rules that
allow sophisticated tracking but  then all compilers that want
to do this optimization have to implement this complicated
rules or need to fall back to being conservative. 


Finally, all integer operations would have a potential hidden
second meaning when applied to addresses of objects, which
makes it much easier to reason about.

Assume you have a function

int difference(int a, int b)
{
        return a - b;
}

And later you replace an expression 
'difference(a, a)' with '0' in the program.
This seems a trivial and logical thing to do, but if
this expression was applied to addresses, you might
have broken a provenance chain and introduced a bug.


In my opinion, integers should stay integers with simple
logical properties. 

Gruß,
Martin

Re: [PATCH] doc: clarify the situation with pointer arithmetic

Reply via email to