[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)

Balazs Benics via cfe-commits Mon, 06 Nov 2023 02:56:26 -0800

steakhal wrote:

> I think every time we need to iterate over all member of an equivalence 
> class, we might do something wrong. The point of the equivalence class would 
> be to make sure those elements are equivalent. One way to avoid iteration 
> would be to always use the representative of the equivalence class. E.g., 
> each time we record a new constraint to a member of the class, we add this 
> information to the representative element. Every time we do a query, we first 
> look up the representative element which already should have all the info 
> from the class and use it instead of the original symbol.
> 
> Would something like this work or am I missing something?


I had to think about it for a little while, and here are my thoughts:
In an example like here:
```c++
void gh_62215(int x, int y, int z) {
  if (x != y) return; // x == y
  if (z <= x) return; // z > x
  if (z >= y) return; // z < y
  clang_analyzer_warnIfReached(); // no-warning: This should be dead code.
  (void)(x + y + z); // keep the constraints alive.
}
```
Here, after the first if, we would have an eqclass of `{x,y}`; with a single 
constraint `x != y: [0,0]`.
After the second if, we would have notionally 3 eqclasses: `{x,y}`, and two 
trivial ones `x` and `y`.
We would also have 3 constraints: `x != y: [0,0]`, `z <= x: [0,0]`, `z <= y: 
[0,0]`.

Note that the keys in the constraints map (symbols) can be 'arbitrarily' 
complex and refer to already dead symbols.
In this example, I keep these symbols artificially alive to make the case 
slightly simpler to discuss.

So, if I understand you correctly, at the 3rd if statement, we should 
canonicalize the symbol we are constraining by walking every sub-symbol and 
substituting it with its equivalent counterpart (if any), by basically with its 
eqclass' representative symbol.
In this example, it would mean that instead of adding a constraint `z >= y: 
[0,0]`, we would instead `z >= x: [0,0]` (assuming that `x` is the 
representative symbol of eqclass `{x,y}`.
I believe this canonicalization could work (and would solve the other 
demonstrative limitation I added in this PR), but would also incur overhead.
And from a design point of view, an eqclass would need to keep the 
representative symbol alive, because otherwise, we can't substitute a symbol 
with the representative symbol.
(Note that for this reason, we don't actually have a representative symbol, but 
rather a unique integral value as an ID for the eqclass - which happens to be 
equal to the `SymbolRef` pointer value of the representative symbol if that's 
still alive.

Here is an example where the representative symbol `x` of the eqclass `{x,y}` 
would die, but remain the ID of the eqclass:
```c++
  // same example, but with:
  (void)(y + z); // X is not mentioned!, thus reclaimed after the 2nd if 
statement.
```

https://github.com/llvm/llvm-project/pull/71284
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)

Reply via email to