steakhal added a comment.

Looks even better. Only minor concerns remained, mostly about style and 
suggestions of llvm utilities.



================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:129-130
 /// Given a pointer/reference argument, return the value it refers to.
-std::optional<SVal> getPointeeOf(const CheckerContext &C, SVal Arg) {
+std::optional<SVal> getPointeeOf(ASTContext &ASTCtx,
+                                 const ProgramStateRef State, SVal Arg) {
   if (auto LValue = Arg.getAs<Loc>())
----------------
BTW I don't know but `State->getStateManager().getContext()` can give you an 
`ASTContext`. And we tend to not put `const` to variable declarations. See [[ 
https://releases.llvm.org/4.0.0/tools/clang/tools/extra/docs/clang-tidy/checks/readability-avoid-const-params-in-decls.html
 | readability-avoid-const-params-in-decls ]]

In other places we tend to refer to `ASTContext` by the `ACtx` I think.
We also prefer const refs over mutable refs. Is the mutable ref justified for 
this case?


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:159
+/// when the return value, or the outgoing parameters are tainted.
+const NoteTag *taintOriginTrackerTag(CheckerContext &C,
+                                     std::vector<SymbolRef> TaintedSymbols,
----------------
My bad. In LLVM style we use `UpperCamelCase` for variable names.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:173-175
+    if (TaintedSymbols.empty()) {
+      return "Taint originated here";
+    }
----------------
Generally, in LLVM style we don't put braces to single block statements unless 
it would hurt readability, which I don't think applies here.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:176-180
+    for (auto Sym : llvm::enumerate(TaintedSymbols)) {
+      LLVM_DEBUG(llvm::dbgs() << "Taint Propagated from argument "
+                              << TaintedArgs.at(Sym.index()) + 1 << "\n");
+      BR.markInteresting(Sym.value());
+    }
----------------
I was also bad with this recommendation.
I think we can now use structured bindings to get the index and value right 
there, like:
`for (auto [Idx, Sym] : llvm::enumerate(TaintedSymbols))`
[[ 
https://github.com/llvm/llvm-project/blob/main/llvm/docs/ProgrammersManual.rst#enumerate
 | See ]]


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:203
+    int nofTaintedArgs = 0;
+    for (auto Sym : llvm::enumerate(TaintedSymbols)) {
+      if (BR.isInteresting(Sym.value())) {
----------------
Same here about structured bindings.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:209-214
+          if (nofTaintedArgs == 0)
+            Out << "Taint propagated to argument "
+                << TaintedArgs.at(Sym.index()) + 1;
+          else
+            Out << ", " << TaintedArgs.at(Sym.index()) + 1;
+          nofTaintedArgs++;
----------------
I'd recommend using `llvm::interleaveComma()` in such cases.
You can probably get rid of `nofTaintedArgs` as well - by using this function.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:211
+            Out << "Taint propagated to argument "
+                << TaintedArgs.at(Sym.index()) + 1;
+          else
----------------
For clang diagnostics we usually use ordinary suffixes like `{st,nd,rd,th}`. It 
would be nice to align with the rest of the clang diagnostics on this.
It would require a bit of work on the wording though, I admit.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:213
+          else
+            Out << ", " << TaintedArgs.at(Sym.index()) + 1;
+          nofTaintedArgs++;
----------------
I believe this branch is uncovered by tests.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:221
+    }
+    return std::string(Out.str());
+  });
----------------
I think since you explicitly specify the return type of the lambda, you could 
omit the spelling of `std::string` here.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:863-864
       State = addTaint(State, Call.getReturnValue());
+      SymbolRef TaintedSym = isTainted(State, Call.getReturnValue());
+      if (TaintedSym) {
+        TaintedSymbols.push_back(TaintedSym);
----------------
We tend to fuse such declarations:
I've seen other cases like this elsewhere. Please check.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:875-878
+      // FIXME: The call argument may be a complex expression
+      // referring to multiple tainted variables.
+      // Now we generate notes and track back only one of them.
+      SymbolRef TaintedSym = isTainted(State, *V);
----------------
You could iterate over the symbol dependencies of the SymExpr (of the `*V` 
SVal).

```lang=c++
SymbolRef PointeeAsSym = V->getAsSymbol();
// eee, can it be null? Sure it can. See isTainted(Region),... for those cases 
we would need to descend and check their symbol dependencies.
for (SymbolRef SubSym : llvm::make_range(PointeeAsSym->symbol_begin(), 
PointeeAsSym->symbol_end())) {
  // TODO: check each if it's also tainted, and update the `TaintedSymbols` 
accordingly, IDK.
}
```
Something like this should work for most cases (except when `*V` refers to a 
tainted region instead of a symbol), I think.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:162
+                                  const CallEvent& Call) {
+  const LocationContext* LC = Call.getCalleeStackFrame(0);
+
----------------
dkrupp wrote:
> steakhal wrote:
> > If the `Call` parameter is only used for acquiring the `LocationContext`, 
> > wouldn't it be more descriptive to directly pass the `LocationContext` to 
> > the function instead?
> > I'm also puzzled that we use `getCalleeStackFrame` here. I rarely ever see 
> > this function, so I'm a bit worried if this pick was intentional. That we 
> > pass the `0` as the `BlockCount` argument only reinforces this instinct.
> The call.getCalleeStackFrame(0) gets the location context of the actual call 
> that we are analyzing (in the pre or postcall), and that's what we need to 
> mark interesting. It is intentionally used like this. I changed the parameter 
> to locationcontext as use suggested.
Okay.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/Taint.cpp:204
       // If this is a SymbolDerived with a tainted parent, it's also tainted.
-      if (isTainted(State, SD->getParentSymbol(), Kind))
-        return true;
+      if (SymbolRef TSR = isTainted(State, SD->getParentSymbol(), Kind))
+        return TSR;
----------------
Here we still have the `TSR` token.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/VLASizeChecker.cpp:220-228
+    if (Kind == VLA_Tainted)
+      BT.reset(new BugType(this,
+                           "Dangerous variable-length array (VLA) declaration",
+                           categories::TaintedData));
+    else
+      BT.reset(new BugType(this,
+                           "Dangerous variable-length array (VLA) declaration",
----------------
dkrupp wrote:
> steakhal wrote:
> > Why don't we use a distinct BugType for this?
> You mean a new bug type instances? Would there be an advantage for that? 
> Seemed to be simpler this way. To distinguish identify the tainted reports 
> with the bug category.
> You mean a new bug type instances? Would there be an advantage for that? 
> Seemed to be simpler this way. To distinguish identify the tainted reports 
> with the bug category.

I never checked how `BugTypes` constitute to bugreport construction, but my gut 
instinct suggests that we should have two separate instances like we frequently 
do for other checkers.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144269/new/

https://reviews.llvm.org/D144269

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to