[PATCH] D30909: [Analyzer] Finish taint propagation to derived symbols of tainted regions

Vlad Tsyrklevich via Phabricator via cfe-commits Sun, 26 Mar 2017 19:31:35 -0700

vlad.tsyrklevich added inline comments.


================
Comment at: lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:494
+  SymbolManager &SM = C.getSymbolManager();
+  return SM.getDerivedSymbol(Sym, LCV.getRegion());
 }
----------------
NoQ wrote:
> I'd think about this a bit more and come back.
> 
> I need to understand how come that constructing a symbol manually is the 
> right thing to do; that doesn't happen very often, but it seems correct here.
Indeed it is odd. The best justification I could come up with: LCVs are meant 
to be optimizations, their 'purpose' is to expose an SVal that hides SymbolRef 
values so that we can have a split store. We don't have to copy all of a 
compound values SymbolRef mappings because LCVs are kept distinct. Hence to 
set/query/constrain region values you use SVals so that RegionStore can 
differentiate between LCVs and SymbolRef backed SVals for the two different 
stores it contains.

The taint interface however requires you taint a SymbolRef, not an SVal. If we 
wanted, instead of doing this logic here, we could change getPointedToSymbol() 
to return an SVal and update usages of it accordingly since that value is only 
passed on to ProgramState.isTainted()/ProgramState.addTaint() anyway. Then we 
could update addTaint/isTainted to perform this logic, hiding it from the 
checker.

This still requires manually constructing a symbol, now it's just performed in 
the analyzer instead of in a checker. Not sure if that addresses the issue you 
were considering, but the idea that we need to 'undo' the LCV optimization 
hiding the SymbolRef to have a value to taint seems somewhat convincing to me. 
What do you think?


================
Comment at: test/Analysis/taint-generic.c:210
+  read(sock, &tainted.st, sizeof(tainted.st));
+  __builtin_memcpy(buffer, tainted.buf, tainted.st[0].length); // 
expected-warning {{Untrusted data is used to specify the buffer size}}
 }
----------------
NoQ wrote:
> Are we already supporting the case when we're tainting some elements of an 
> array but not all of them, and this works as expected? Could we add such 
> tests (regardless of whether we already handle them)?
It does work in that case. If you taint element X of region Y the current logic 
will be conservative and only mark element X as tainted, not X-i or X+i. This 
is also true for element 0, so if a programmer passes &array[0] but reads 
sizeof(array) bytes it will not correctly mark that. This is also a short 
coming of the invalidation code so I don't think there's much to do until 
there's more general support for dealing with region extents.


https://reviews.llvm.org/D30909



_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D30909: [Analyzer] Finish taint propagation to derived symbols of tainted regions

Reply via email to