https://github.com/haoNoQ commented:
Not really, this is some ancient history even by my standards đŸ˜… But, yes, this has bummed me out a few times too. I think the whole point of the `CallEvent` was to provide user-friendly access to path-sensitive aspects of the call such as `CallEvent.getArgSVal()` and `CallEvent.getReturnValue()`. But yeah, the way these checker callbacks accept the state twice, is absolutely weird. I'm pretty sure there even used to be a place where it wasn't true that `CheckerContext.getState() == CallEvent.getState()`. (IIRC I fixed one but there could be more places like that. I didn't have the guts to add that as an assertion at the invocation sites of these callbacks.) Now, when it comes to the way the state mutates during the callback (such as making sure `getReturnValue()` produces the value you've just bound to the call expression), I don't think `CallEvent` can reasonably handle it, even with a layer of indirection that keeps the state up to date. Like, _state splits_ are a thing. If your `evalCall()` splits the path in two and binds two different return values, which one do you return? >From this perspective, yeah, it may be slightly useful to keep the original >pre-call state around. Say, if you're modeling a function `void foo(int *arg)` >that effectively does something like `if (*arg == 0) *arg = 1;`, and your >checker callback is sufficiently complex, then at some point you may need to >ask "Uhh can you please remind me what the original value was?" - and then >`CallEvent` comes in helpful. But I don't think this minor benefit outweighs >the confusion caused by calling that very specific program state "the state". >I think it's much easier to handle that by deliberately remembering the >original state in a local variable, and then pass that state around. It's >probably not a popular situation in the first place. (You're probably aware of a significantly more annoying version of the problem: the problem of reading the original value of `*arg` from inside `checkPostCall`. This comes up in the taint checker a lot. But that's much more expensive to provide uniformly.) So ultimately you'll need to be careful either way. But there's definitely a few ways we could eliminate the confusion. First, yeah, like you said, we could have a static or dynamical typestate that indicates whether `getReturnValue()` makes sense. (It only ever makes sense in `checkPostCall`.) Then, it might be a good idea to simply make `CallEvent.getState()` *private*. It wasn't our goal to provide convenient access to the entire state. We just wanted a nice `getArgSVal()` and such. Maybe let's provide only the actually-useful, non-confusing methods? Or we could rename the method to something more descriptive, like `getStateBeforeCall()` and `getStateAfterCall()` - and make those available/unavailable depending on the typestate of `CallEvent`. If we're so worried about the `*arg` situation we may also provide a `getSValBeforeCall(const MemRegion *)`. Alternatively, it may be a good idea to implement these methods in `CheckerContext` itself, and then stop showing `CallEvent` to the user. So instead of `CallEvent.getArgSVal()` you'd need to type `CheckerContext.getCallArgSVal()`. Unfortunately this disconnects the user from the entire polymorphism of the `CallEvent` class. There's actually a lot of methods that would need to be reimplemented (like `AnyCXXConstructorCall.getCXXThisVal()` or `CXXAllocatorCall.getArraySizeVal()) and most of them wouldn't make sense most of the time. Finally, if you're particularly interested in `getReturnValue()` updating in sync, it may be possible to do that if you somehow indicate your intent to never split the state in your callback. And that's valuable on its own because of all the accidental state splits that you can introduce by forgetting to chain your `addTransition()`s or `generateNonFatalErrorNode()`s. (The latter is particularly brutal because it really doesn't sound like it has anything to do with state splitting. Like, I want to emit two non-fatal errors, it's only natural that I invoke the generation of two non-fatal error nodes. With fatal errors it's actually a perfectly valid thing to do. But if later you decide that the errors aren't all that fatal, all of a sudden your entire checker breaks down. And good luck noticing that during testing.) So if there was a way to indicate the callback's intention explicitly, eg. `CheckerContext.setOutgoingNodeLimit(1)` (maybe even set it by default to `1`; most checkers need either that or `2`) then we'd avoid all the subtle bugs of that nature. Then, naturally, we'd also be able to have a `CheckerContext.setReturnValue()` method that's available only in `evalCall()` and only when the outgoing node limit is 1. And then you could look up that value as much as you want! (Though, of course, you'd still never want to set the return value more than once.) https://github.com/llvm/llvm-project/pull/160707 _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
