steakhal added a comment. Looks even better. Only minor concerns remained, mostly about style and suggestions of llvm utilities.
================ Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:129-130 /// Given a pointer/reference argument, return the value it refers to. -std::optional<SVal> getPointeeOf(const CheckerContext &C, SVal Arg) { +std::optional<SVal> getPointeeOf(ASTContext &ASTCtx, + const ProgramStateRef State, SVal Arg) { if (auto LValue = Arg.getAs<Loc>()) ---------------- BTW I don't know but `State->getStateManager().getContext()` can give you an `ASTContext`. And we tend to not put `const` to variable declarations. See [[ https://releases.llvm.org/4.0.0/tools/clang/tools/extra/docs/clang-tidy/checks/readability-avoid-const-params-in-decls.html | readability-avoid-const-params-in-decls ]] In other places we tend to refer to `ASTContext` by the `ACtx` I think. We also prefer const refs over mutable refs. Is the mutable ref justified for this case? ================ Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:159 +/// when the return value, or the outgoing parameters are tainted. +const NoteTag *taintOriginTrackerTag(CheckerContext &C, + std::vector<SymbolRef> TaintedSymbols, ---------------- My bad. In LLVM style we use `UpperCamelCase` for variable names. ================ Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:173-175 + if (TaintedSymbols.empty()) { + return "Taint originated here"; + } ---------------- Generally, in LLVM style we don't put braces to single block statements unless it would hurt readability, which I don't think applies here. ================ Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:176-180 + for (auto Sym : llvm::enumerate(TaintedSymbols)) { + LLVM_DEBUG(llvm::dbgs() << "Taint Propagated from argument " + << TaintedArgs.at(Sym.index()) + 1 << "\n"); + BR.markInteresting(Sym.value()); + } ---------------- I was also bad with this recommendation. I think we can now use structured bindings to get the index and value right there, like: `for (auto [Idx, Sym] : llvm::enumerate(TaintedSymbols))` [[ https://github.com/llvm/llvm-project/blob/main/llvm/docs/ProgrammersManual.rst#enumerate | See ]] ================ Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:203 + int nofTaintedArgs = 0; + for (auto Sym : llvm::enumerate(TaintedSymbols)) { + if (BR.isInteresting(Sym.value())) { ---------------- Same here about structured bindings. ================ Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:209-214 + if (nofTaintedArgs == 0) + Out << "Taint propagated to argument " + << TaintedArgs.at(Sym.index()) + 1; + else + Out << ", " << TaintedArgs.at(Sym.index()) + 1; + nofTaintedArgs++; ---------------- I'd recommend using `llvm::interleaveComma()` in such cases. You can probably get rid of `nofTaintedArgs` as well - by using this function. ================ Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:211 + Out << "Taint propagated to argument " + << TaintedArgs.at(Sym.index()) + 1; + else ---------------- For clang diagnostics we usually use ordinary suffixes like `{st,nd,rd,th}`. It would be nice to align with the rest of the clang diagnostics on this. It would require a bit of work on the wording though, I admit. ================ Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:213 + else + Out << ", " << TaintedArgs.at(Sym.index()) + 1; + nofTaintedArgs++; ---------------- I believe this branch is uncovered by tests. ================ Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:221 + } + return std::string(Out.str()); + }); ---------------- I think since you explicitly specify the return type of the lambda, you could omit the spelling of `std::string` here. ================ Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:863-864 State = addTaint(State, Call.getReturnValue()); + SymbolRef TaintedSym = isTainted(State, Call.getReturnValue()); + if (TaintedSym) { + TaintedSymbols.push_back(TaintedSym); ---------------- We tend to fuse such declarations: I've seen other cases like this elsewhere. Please check. ================ Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:875-878 + // FIXME: The call argument may be a complex expression + // referring to multiple tainted variables. + // Now we generate notes and track back only one of them. + SymbolRef TaintedSym = isTainted(State, *V); ---------------- You could iterate over the symbol dependencies of the SymExpr (of the `*V` SVal). ```lang=c++ SymbolRef PointeeAsSym = V->getAsSymbol(); // eee, can it be null? Sure it can. See isTainted(Region),... for those cases we would need to descend and check their symbol dependencies. for (SymbolRef SubSym : llvm::make_range(PointeeAsSym->symbol_begin(), PointeeAsSym->symbol_end())) { // TODO: check each if it's also tainted, and update the `TaintedSymbols` accordingly, IDK. } ``` Something like this should work for most cases (except when `*V` refers to a tainted region instead of a symbol), I think. ================ Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:162 + const CallEvent& Call) { + const LocationContext* LC = Call.getCalleeStackFrame(0); + ---------------- dkrupp wrote: > steakhal wrote: > > If the `Call` parameter is only used for acquiring the `LocationContext`, > > wouldn't it be more descriptive to directly pass the `LocationContext` to > > the function instead? > > I'm also puzzled that we use `getCalleeStackFrame` here. I rarely ever see > > this function, so I'm a bit worried if this pick was intentional. That we > > pass the `0` as the `BlockCount` argument only reinforces this instinct. > The call.getCalleeStackFrame(0) gets the location context of the actual call > that we are analyzing (in the pre or postcall), and that's what we need to > mark interesting. It is intentionally used like this. I changed the parameter > to locationcontext as use suggested. Okay. ================ Comment at: clang/lib/StaticAnalyzer/Checkers/Taint.cpp:204 // If this is a SymbolDerived with a tainted parent, it's also tainted. - if (isTainted(State, SD->getParentSymbol(), Kind)) - return true; + if (SymbolRef TSR = isTainted(State, SD->getParentSymbol(), Kind)) + return TSR; ---------------- Here we still have the `TSR` token. ================ Comment at: clang/lib/StaticAnalyzer/Checkers/VLASizeChecker.cpp:220-228 + if (Kind == VLA_Tainted) + BT.reset(new BugType(this, + "Dangerous variable-length array (VLA) declaration", + categories::TaintedData)); + else + BT.reset(new BugType(this, + "Dangerous variable-length array (VLA) declaration", ---------------- dkrupp wrote: > steakhal wrote: > > Why don't we use a distinct BugType for this? > You mean a new bug type instances? Would there be an advantage for that? > Seemed to be simpler this way. To distinguish identify the tainted reports > with the bug category. > You mean a new bug type instances? Would there be an advantage for that? > Seemed to be simpler this way. To distinguish identify the tainted reports > with the bug category. I never checked how `BugTypes` constitute to bugreport construction, but my gut instinct suggests that we should have two separate instances like we frequently do for other checkers. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D144269/new/ https://reviews.llvm.org/D144269 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits