Charusso added inline comments.
================ Comment at: clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp:201-207 +void CERTStrChecker::evalGets(const CallEvent &Call, const CallContext &CallC, + CheckerContext &C) const { + unsigned DestPos = *CallC.DestinationPos; + const Expr *DestArg = Call.getArgExpr(DestPos)->IgnoreImpCasts(); + SVal DestV = Call.getArgSVal(DestPos); + + auto Report = getReport(*BT, Call, CallC, C); ---------------- NoQ wrote: > NoQ wrote: > > All right, so basically what you're saying is that literally every > > invocation of `gets()` deserves a warning. This means that for all > > practical purposes your checker //is// an AST-based checker, just > > implemented with path-sensitive callbacks. A path-sensitive checker emits > > warnings based on multiple events that happen sequentially along the path > > (use-after-free: "memory deallocated - that same memory used", division by > > zero: "value constrained to zero - something is being divided by that same > > value", etc.) but your checker emits the warning by looking at only one > > statement: "`gets()` is invoked". > > > > Do i understand correctly that your plan is to use path-sensitive analysis > > for fixits only? But you can't emit fixits for any truly path-sensitive > > warning anyway. Fixits must work correctly on all execution paths, so you > > cannot emit a correct fixit by looking at only one execution path. In order > > to emit fixits, you need to either resort to a pure AST check anyway ("this > > expression refers to an array of fixed size"), or maybe implement auxiliary > > data-flow analysis for a certain must-problem (eg., "the buffer argument > > may have exactly one possible value across all paths that reach `gets()`"). > > But in both cases the path-sensitive engine does literally nothing to help > > you; all the data that you'll need for your fixit will be available from > > the AST. > Like, i think this was an interesting investigation and i was genuinely > curious about how this turns out to be, but for now it seems that the problem > you're trying to solve cannot be solved this way. Path sensitive analysis is > fundamentally applicable to only 50% of the problems (to "may-problems" but > not to "must-problems"), and the problem you're trying to solve is in the > latter category. I believe you'll have to fall back to the relatively boring > task of adding fixits to `security.insecureAPI.gets`; but then, again, if you > manage to employ use-def chains for this problem, that might be quite an > inspiring start. Most of the time the given allocation to hold the arbitrary string happens in a local scope. After that we see `fscanf(dst)`, `gets(dst)`, `memcpy(dst)`, `strncpy(dst)`, stuff... which pushes new data into that memory block, and then the cool developers write that down: `dst[42] = '\0'` which means all the reports should be thrown away in a path-sensitive manner on `dst`. Reallocations, re-bindings, non-AST stuff could handled very easily with the non-AST checker, like that one. Sometimes we are work with destination-array like `memcpy(Foo[Bar.Baz]->Qux, ...)` which could not really handled with just a simple AST-based checker. I could not say at the moment we could handle it with symbols, but we have a much larger scope of information by symbols. Most of the time because of the Analyzer is much smarter than the Tidy we could emit fix-its with the help of flow-sensitiveness very easily. I would create huge white-lists what we want to fix-it, and what we could not, but at some point if we model the symbols better, we can. Other than that easy false-positive suppression and tiny flow-sensitive rebinding stuff, we could be sure what is going on by each string-manipulation. The `gets()` is a toy example where at most a `grep -rn 'gets('` could do better analysis than us. The real world looks like that: ``` 1 encryptedpasswordlen = ((strlen(passwd) + RADIUS_VECTOR_LENGTH - 1) / RADIUS_VECTOR_LENGTH) * RADIUS_VECTOR_LENGTH; 2 cryptvector = palloc(strlen(secret) + RADIUS_VECTOR_LENGTH); 3 memcpy(cryptvector, secret, strlen(secret)); ... 4 for (i = 0; i < encryptedpasswordlen; i += RADIUS_VECTOR_LENGTH) { 5 memcpy(cryptvector + strlen(secret), md5trailer, RADIUS_VECTOR_LENGTH); ... ``` - from `postgresql/src/backend/libpq/auth.c` At `3` we would emit a warning, because the null-termination left by the wrong size of the string, but at `5` we see that, it left, because at that offset the string continues, and dunno, on `6` when we model every flow-sensitive information, the string left non-terminated. Of course each of that stuff is local and AST-based (with huge overhead of rebindings and impossible false-positive suppression), but when you have two of it, that is when the fun begins. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D69813/new/ https://reviews.llvm.org/D69813 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits