steakhal created this revision.
steakhal added reviewers: xazax.hun, NoQ, martong, ASDenysPetrov, vabridgers, 
Szelethus.
Herald added subscribers: manas, dkrupp, donat.nagy, mikhail.ramalho, 
a.sidorin, rnkovacs, szepet, baloghadamsoftware.
Herald added a project: All.
steakhal requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

One might think that by merging the equivalence classes (eqclasses) of `Sym1` 
and `Sym2` symbols we would end up with a `State` in which the eqclass of 
`Sym1` and `Sym2` should now be the same. Surprisingly, it's not //always// 
true.

Such an example triggered the assertion enforcing this //unjustified// 
invariant in https://github.com/llvm/llvm-project/issues/58677.

  unsigned a, b;
  #define assert(cond) if (!(cond)) return
  
  void f(unsigned c) {
      /*(1)*/ assert(c == b);
      /*(2)*/ assert((c | a) != a);
      /*(3)*/ assert(a);
      // a = 0  =>  c | 0 != 0  =>  c != 0  =>  b != 0
  }

I believe, that this assertion hold for reasonable cases - where both 
`MemberSym` and `SimplifiedMemberSym` refer to live symbols.
It can only fail if `SimplifiedMemberSym` refers to an already dead symbol. See 
the reasoning at the very end.
In this context, I don't know any way of determining if a symbol is alive/dead, 
so I cannot refine the assertion in that way.
So, I'm proposing to drop this unjustified assertion.

---

Let me elaborate on why I think the assertion is wrong in its current shape.

Here is a quick reminder about equivalence classes in CSA.
We have 4 mappings:

1. `ClassMap`: map, associating `Symbols` with an `EquivalenceClass`.
2. `ClassMembers`: map, associating `EquivalenceClasses` with its members - 
basically an enumeration of the `Symbols` which are known to be equal.
3. `ConstraintRange`: map, associating `EquivalenceClasses` with the range 
constraint which should hold for all the members of the eqclass.
4. `DisequalityMap`: I'm omitting this, as it's irrelevant for our purposes now.

Whenever we encounter/assume that two `SymbolRefs` are equal, we update the 
`ClassMap` so that now both `SymbolRefs` are referring to the same eqclass. 
This operation is known as `merge` or `union`.
Each eqclass is uniquely identified by the `representative` symbol, but it 
could have been just a unique number or anything else - the point is that an 
eqclass is identified by something unique.
Initially, all Symbols form - by itself - a trivial eqclass, as there are no 
other Symbols to which it is assumed to be equal. A trivial eqclass has 
//notionally// exactly one member, the representative symbol.
I'm emphasizing that //notionally// because for such cases we don't store an 
entry in the `ClassMap` nor in the `ClassMembers` map, because it could be 
deduced.

By `merging` two eqclasses, we essentially do what you would think it does. An 
important thing to highlight is that the representative symbol of the resulting 
eqclass will be the same as one of the two eqclasses.
This operation should be commutative, so that `merge(eq1,eq2)` and 
`merge(eq2,eq1)` should result in the same eqclass - except for the 
representative symbol, which is just a unique identifier of the class, a name 
if you will. Using the representative symbol of `eq1` or `eq2` should have no 
visible effect on the analysis overall.

When merging `eq1` into `eq2`, we take each of the `ClassMembers` of `eq1` and 
add them to the `ClassMembers` of `eq2` while we also redirect all the `Symbol` 
members of `eq1` to map to the `eq2` eqclass in the `ClassMap`. This way all 
members of `eq1` will refer to the correct eqclass.
After these, `eq1` key is unreachable in the `ClassMembers`, hence we can drop 
it.

---

Let's get back to the example.
Note that when I refer to symbols `a`, `b`, `c`, I'm referring to the 
`SymbolRegionValue{VarRegion{.}}` - the value of that variable.

After `(1)`, we will have a constraint `c == b : [1,1]` and an eqclass `c,b` 
with the `c` representative symbol.
After `(2)`, we will have an additional constraint `c|b != a : [1,1]` with the 
same eqclass. We will also have disequality info about that `c|a` is disequal 
to `a` - and the other way around.
However, after the full-expression, `c` will become dead. This kicks in the 
garbage collection, which transforms the state into this:

- We no longer have any constraints, because only `a` is alive, for which we 
don't have any constraints.
- We have a single (non-trivial) eqclass with a single element `b` and 
representative symbol `c`. (Dead symbols can be representative symbols.)
- We have the same disequality info as before.

At `(3)` within the false branch, `a` get perfectly constrained to zero. This 
kicks in the simplification, so we try to substitute (simplify) the variable in 
each SymExpr-tree. In our case, it means that the `c|a != a : [1,1]` constraint 
gets re-evaluated as `c|0 != 0 : [1,1]`, which is `c != 0 : [1,1]`.
Under the hood, it means that we will call `merge(c|a, c)`. where `c` is the 
result of `simplifyToSVal(State, MemberSym).getAsSymbol()` inside 
`EquivalenceClass::simplify()`.
Note that the result of `simplifyToSVal()` was a dead symbol. We shouldn't have 
acquired an already dead symbol. AFAIK, this is the only way we can get one at 
this point.
Since `c` is dead, we no longer have a mapping in `ClassMap` for it; hence when 
we try to `find()` the eqclass of `c`, it will report that it's a trivial 
eqclass with the representative symbol `c`.

After `merge(c|a, c)`, we will have a single (non-trivial) eqclass of `b, c|a` 
with the representative symbol `c|a` - because we merged the eqclass of `c` 
into the eqclass of `c|a`.

This means that `find(c|a)` will result in the eqclass with the representative 
symbol `c|a`. So, we ended up having different eqclasses for `find(c|a)` and 
`find(c)` after `merge(c|a, c)`, firing the assertion.

I believe, that this assertion hold for reasonable cases - where both 
`MemberSym` and `SimplifiedMemberSym` refer to live symbols.
`MemberSym` should be live in all cases here, because it is from `ClassMembers` 
which is pruned in `removeDeadBindings()` so these must be alive. In this 
context, I don't know any way of determining if a symbol is alive/dead, so I 
cannot refine the assertion in that way.
So, I'm proposing to drop this unjustified assertion.

I'd like to thank @martong for helping me to conclude the investigation.
It was really difficult to track down.

PS: I mentioned that `merge(eq1, eq2)` should be commutative. We actually 
exploit this for merging the smaller eqclass into the bigger one within 
`EquivalenceClass::merge()`.
Yea, for some reason, if you swap the operands, 3 tests break (only failures, 
no crashes) aside from the test which checks the state dump. But I believe, 
that is a different bug and orthogonal to this one. I just wanted to mention 
that.

- `Analysis/solver-sym-simplification-adjustment.c`
- `Analysis/symbol-simplification-fixpoint-iteration-unreachable-code.cpp`
- `Analysis/symbol-simplification-reassume.cpp`


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D138037

Files:
  clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp


Index: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
===================================================================
--- clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
+++ clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
@@ -2616,7 +2616,18 @@
       if (OldState == State)
         continue;
 
-      assert(find(State, MemberSym) == find(State, SimplifiedMemberSym));
+      // Be aware that `SimplifiedMemberSym` might refer to an already dead
+      // symbol. In that case, the eqclass of that might not be the same as the
+      // eqclass of `MemberSym`. This is because the dead symbols are not
+      // preserved in the `ClassMap`, hence
+      // `find(State, SimplifiedMemberSym)` will result in a trivial eqclass
+      // compared to the eqclass of `MemberSym`.
+      // These eqclasses should be the same if `SimplifiedMemberSym` is alive.
+      // --> assert(find(State, MemberSym) == find(State, SimplifiedMemberSym))
+      //
+      // Note that `MemberSym` must be alive here since that is from the
+      // `ClassMembers` where all the symbols are alive.
+
       // Remove the old and more complex symbol.
       State = find(State, MemberSym).removeMember(State, MemberSym);
 


Index: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
===================================================================
--- clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
+++ clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
@@ -2616,7 +2616,18 @@
       if (OldState == State)
         continue;
 
-      assert(find(State, MemberSym) == find(State, SimplifiedMemberSym));
+      // Be aware that `SimplifiedMemberSym` might refer to an already dead
+      // symbol. In that case, the eqclass of that might not be the same as the
+      // eqclass of `MemberSym`. This is because the dead symbols are not
+      // preserved in the `ClassMap`, hence
+      // `find(State, SimplifiedMemberSym)` will result in a trivial eqclass
+      // compared to the eqclass of `MemberSym`.
+      // These eqclasses should be the same if `SimplifiedMemberSym` is alive.
+      // --> assert(find(State, MemberSym) == find(State, SimplifiedMemberSym))
+      //
+      // Note that `MemberSym` must be alive here since that is from the
+      // `ClassMembers` where all the symbols are alive.
+
       // Remove the old and more complex symbol.
       State = find(State, MemberSym).removeMember(State, MemberSym);
 
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to