[clang] [clang][dataflow] Treat comma operator correctly in `getResultObjectLocation()`. (PR #78427)

via cfe-commits Fri, 19 Jan 2024 00:22:02 -0800

martinboehme wrote:

> This change itself looks good to me, but I start to doubt whether we actually 
> need `getResultObjectLocation` the way it is currently implemented. One of 
> the red flags for me seeing some `getResultObjectLocation` calls in 
> `UncheckedOptionalAccessModel.cpp`. I think most (all?) of those calls are 
> redundant, the type of the expression already guarantees that we have the 
> right location. Moreover, I think the propagation should happen internally in 
> the built-in transfers, so authors of user code like 
> `UncheckedOptionalAccessModel` should never need to think about these 
> implementation details.
> 
> I think it might make sense to review all of the calls and after we removed 
> the redundant ones, we might want to reconsider if it makes sense to restrict 
> the use of this function somewhat.
> 
> Let me know if I am missing something.


The missing context here, I think, is that the current implementation of 
`getResultObjectLocation()` is deficient and I want to fix it (hopefully soon). 
This fix should be transparent to the existing callers of 
`getResultObjectLocation()` (but it does mean that we currently call 
`getResultObjectLocation()` in places where it looks as if we could do 
something simpler).

Let me expand. Currently, `getResultObjectLocation()` propagates the location 
from the "original record constructor" up towards any prvalue ancestors of the 
original record constructor. The location for the "original record constructor" 
is simply pulled out of thin air by the transfer function for that node.

But this is wrong. For example, consider the following code:

```cxx
  A a = some_condition()? A(1) : A(2, 3);
```

When either the `A(1)` or `A(2, 3)` constructor are called, the `this` pointer 
that is passed to them should be `&a`. But our current implementation doesn't 
do this. When visiting `A(1)` and `A(2, 3)`, it independently creates storage 
locations for them ("out of thin air") and tries to propagate them upwards 
(towards the root). This poses an unsolvable problem when processing the 
conditional operator; the "result object location" for the conditional operator 
should be identical to the result object location for both branches, but this 
is currently impossible to achieve. Instead, we currently don't really model 
the conditional operator at all -- we just invent a fresh location for it, and 
hence we claim that the conditional operator is an "original record 
constructor". See also [this 
comment](https://github.com/llvm/llvm-project/blob/c6a6547798ca641b985456997cdf986bb99b0707/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp#L751).

What we should instead be doing is propagating the result object location 
_downwards_ from the result object to any prvalues that might initialize it. 
This is described in a FIXME 
[here](https://github.com/llvm/llvm-project/blob/c6a6547798ca641b985456997cdf986bb99b0707/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h#L332).
 (By the way, this is what CodeGen does too -- not surprisingly.)

The current implementation is also terrible in other ways. We do actually get 
the following right:

```cxx
  A a = A(1);
```

If you query `getResultObjectLocation()` for `A(1)`, you'll find it gives you 
the same location that you get if we call `Environment::getStorageLocation()` 
for `a`. But the way we arrive here is a terrible hack: When processing the 
`VarDecl` for `a`, we retrieve the `RecordValue` for its initializer, look at 
its `RecordValue::getLoc()`, and "steal" this location as the location to use 
for the storage of the variable. But we can't actually always do this stealing; 
for example, when processing a member initializer, the location for the member 
variable has already been determined, and so we need to [perform a 
copy](https://github.com/llvm/llvm-project/blob/3b54337be54107a5f196bb98dfc858b10ec35b8a/clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp#L419)
 that should not actually be happening.

A nice side effect of fixing `getResultObjectLocation()` to do the right thing 
is that it will make `RecordValue::getLoc()` obsolete -- and because this is 
the last remaining reason for `RecordValue` to exist at all, this will allow us 
to eliminate `RecordValue` entirely, something I've been working towards for a 
while (see for example [this 
discussion](https://reviews.llvm.org/D155204#inline-1503204)).

Getting back to your original point: Yes, many of the existing 
`getResultObjectLocation()` calls are actually redundant -- they could be 
replaced with code that retrieves the `RecordValue`, then obtains the location 
from `RecordValue::getLoc()`. But the `getResultObjectLocation()` calls 
future-proof the code -- they will allow the code to continue to work when we 
change `getResultObjectLocation()` to propagate locations from result objects 
down towards the "original record constructors", rather than upwards as we do 
today.

This has been pretty lengthy, but I hope it gives more context.

https://github.com/llvm/llvm-project/pull/78427
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang][dataflow] Treat comma operator correctly in `getResultObjectLocation()`. (PR #78427)

Reply via email to