NagyDonat wrote:

Thanks for all the valuable feedback!

> We should separate out `State` from `CallEvent` or have some layer that would 
> ensure that the State attached with a Call is always up to date.

@steakhal In an ideal world I agree with this, but I fear that both suggestions 
have difficulties:
- There are lots of methods that rely on the `State` within the `CallEvent` – 
even seemingly innocent methods like `SimpleFunctionCall::getDecl()` can depend 
on the state (when a function is called through a pointer). Before checking 
this I thought that it would be nice to separate out `State` from the 
`CallEvent`, but based on this I fear that this wouldn't be practical.
- Ensuring that the Call always refers to the most recent state is practically 
impossible, because the "most recent state" is only tracked informally: the 
developer knows which state variable is the most recent, but this is not 
represented in a machine-readable way. (In some functional languages there are 
so called "uniqueness types" or the [`State` 
monad](https://hackage.haskell.org/package/mtl-2.3.1/docs/Control-Monad-State-Lazy.html)
 which can express invariants like "always use the most recent state" but C++ 
doesn't provide a suitable abstraction for this goal.)

Based on this right now I feel that the best approach would be gradually 
improving the situation by smaller changes.

> > I will create a test with a new debug checker which modifies the state in 
> > its `PreCall` callback and validates that the modification is visible 
> > through the `CallEvent` objects received by the `EvalCall` and 
> > `PointerEscape` callbacks. Is this sufficient, or would you prefer a 
> > different kind of test?
> 
> I think the best way to go about this is a unittest with a custom bug report 
> visitor that prints a special trait along the path. Like a logger. And the 
> visitor would print these values alongside the ProgramPoint it is attached to.

I don't understand how could a bug reporter visitor help in this situation. The 
bug (which is fixed by this PR) is that checker callbacks receive `CallEvent` 
objects whose attached state is old (and differs from the state available 
through the `CheckerContext` which is also passed to the same callback). 
Testing this requires two  components:
- Checker callbacks that validate that the `CallEvent` and the `CheckerContext` 
have (references to) the same state.
- Checker callbacks that modify the state (at suitable points e.g. the PreCall 
callback) to ensure that states that _may_ be different are actually different. 
It wouldn't be too difficult to find "real" checkers that cause suitable state 
changes, but to make the test simple and self-contained I'm leaning towards a 
"synthetic" state change from a debug checker (the same one that also does the 
validation).
A bug report visitor doesn't help with either of these components, because it 
cannot observe the state attached to the `CallEvent` (it is just passed to the 
checker callback, but not persisted in the `ExplodedGraph` for later use by 
e.g. visitors) and it (obviously) cannot update the state.  

> From this perspective, yeah, it may be slightly useful to keep the original 
> pre-call state around. Say, if you're modeling a function `void foo(int 
> *arg)` that effectively does something like `if (*arg == 0) *arg = 1;`, and 
> your checker callback is sufficiently complex, then at some point you may 
> need to ask "Uhh can you please remind me what the original value was?" - and 
> then `CallEvent` comes in helpful. But I don't think this minor benefit 
> outweighs the confusion caused by calling that very specific program state 
> "the state". I think it's much easier to handle that by deliberately 
> remembering the original state in a local variable, and then pass that state 
> around. It's probably not a popular situation in the first place.

I didn't even think about this usecase of the "archived" `State` attached to 
the call event. I agree that this could be useful within checker code, and I 
also agree that this is probably a rare situation and should be handled by 
explicitly saving the original state in a local variable. (Note: this "state 
attached to call becomes obsolete within the checker" situation is distinct 
from the "checker callback receives a CallEvent with a state that is already 
obsolete at the beginning of the call" bug which is fixed by this PR.)

> First, yeah, like you said, we could have a static or dynamical typestate 
> that indicates whether `getReturnValue()` makes sense. (It only ever makes 
> sense in `checkPostCall`.)

I support this idea -- writing a dynamic check would be trivial, and even a 
static check is not too difficult.

> Then, it might be a good idea to simply make `CallEvent.getState()` 
> _private_. It wasn't our goal to provide convenient access to the entire 
> state. We just wanted a nice `getArgSVal()` and such. Maybe let's provide 
> only the actually-useful, non-confusing methods?

Making `CallEvent.getState()` private would be a very easy "harm prevention" 
change that would reduce the damage potential of this footgun. A quick search 
reveals that outside of `CallEvent.{cpp,h}` this method is only used twice, and 
in both of those locations it would be very simple to get the state from the 
readily available `CheckerContext` instead. (This is very fortunate -- I would 
have guessed that there is more of this...) In fact I'll soon create a trivial 
separate PR for this proposal, because it's a trivial change and I don't see 
any downside.

Unfortunately this is just "sweeping the problems under the rug" and not a 
complete solution -- even if we cannot directly access the obsolete old state 
object, the analyzer may still run into logically wrong behavior if methods 
like `getArgSVal` produce obsolete results from the old state. There are 
probably situations where the old state is not _too_ old and the values that 
are looked up from it happen to be correct (e.g. a PreCall callback can't 
rebind the value of an expression in the `Environment`), but reasoning about 
the partial correctness of old states is not a safe long-term solution.

> Alternatively, it may be a good idea to implement these methods in 
> `CheckerContext` itself, and then stop showing `CallEvent` to the user. So 
> instead of `CallEvent.getArgSVal()` you'd need to type 
> `CheckerContext.getCallArgSVal()`. Unfortunately this disconnects the user 
> from the entire polymorphism of the `CallEvent` class. There's actually a lot 
> of methods that would need to be reimplemented (like 
> `AnyCXXConstructorCall.getCXXThisVal()` or 
> `CXXAllocatorCall.getArraySizeVal()) and most of them wouldn't make sense 
> most of the time.

WDYT about a change that keeps the `CallEvent` type hierarchy but ensures that 
`CallEvent`s can only be constructed by the `CheckerContext`? I can envision a 
situation where:
- There are "empty"/"stateless"/"template" `CallEvent` objects that are passed 
around in the engine e.g. in `ExprEngine::evalCall`.
- The constructor of `CheckerContext` can (optionally) own a `CallEvent` (of 
this empty stateless kind).
- Checkers can use a new method called `CheckerContext::getCallEvent()` which 
combines the stateless call event with the state of the `CheckerContext` to 
return a "regular" `CallEvent`.

> I think the assertion I propose is the easiest way to test this patch. I.e., 
> the fact that the newly added assertion used to crash without the rest of the 
> patch but doesn't crash anymore, is a sufficient defense against regressions 
> in and of itself.

@haoNoQ Are you speaking about the assertion to validate that 
`CheckerContext.getState() == CallEvent.getState()`? I agree that this 
assertion enforces an invariant that should be held, but unfortunately I don't 
see a "natural place" for it.
- Placing this on the engine side of the engine/checker boundary (just before 
calling the callback) seems natural, but that's very close to the location 
where the `CheckerContext` and the (updated) `CallEvent` are created -- so we 
would just write a tautologically passing check that almost looks like `foo = 
bar; assert(foo == bar);`.
- Placing this assertion into every checker callback (that takes a `CallEvent`) 
would be too much "pollution" IMO.
- Finally, we could place this assertion in a debug checker that is designed 
for this purpose and activated in a test that ensures that its callbacks are 
triggered (with frequent state changes to reveal the use of obsolete states). 
This is essentially equivalent to the testing approach that I suggested.

https://github.com/llvm/llvm-project/pull/160707
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to