DonatNagyE wrote:

> By looking at the disappeared reports you attached, I'm convinced that the 
> `MsgTaintedBufferSize` diagnostics give little to no benefit in general. On 
> the other side, I've seen good hits for OOBV2 in the presence of taint - even 
> if that's rarely the case. On the theory side, I also believe that 
> propagation should happen on `strlen` and similar functions.

You're right that theoretically speaking the `strlen`-like functions propagate 
taint, but their return value is practically always constrained from above (if 
the string has a null terminator, the length is less than its extent in memory) 
and I think that propagating taint without considering these constraint is 
worse than not propagating taint.

By the way, could you show an OOBV2 true positive that involves taint 
propagated by `strlen` call? My impression was that the "index is tainted" 
errors that involve `strlen` are typically false positives where the analyzer 
cannot deduce that `s[strlen(s)]` is valid and e.g. `s[strlen(s)-1]` is also 
valid when the string is nonempty.

> Consequently, I agree with the raised problems, but I disagree with the 
> approach. I would rather remove the `MsgTaintedBufferSize` diagnostic to 
> resolve those FPs. Alternatively, we can also think of creating a heuristic 
> to reduce such FPs. For e.g. check if the most significant bit of the 
> allocation size is proven to be unset (aka. checked some meaningful 
> upperbounds) and suppress reports in that case, but report otherwise. Would 
> it be okay with you to proceed by not removing taint propagation?

I agree that `MsgTaintedBufferSize` reports should be limited to cases where 
the tainted value _can be large_ and I would suggest a threshold that's 
significantly lower than the SIZE_MAX/2 implied by your "most significant bit" 
suggestion (ideally the threshold value would be configurable). However, I 
think this eliminates the false positives related to `strlen` only if we also 
add some logic that can reliably "tag" the return value of `strlen` with 
reasonable upper bounds.

I'd be happy to accept a patch that (1) ensures that `strlen` propagates taint 
(2) reliably puts upper bounds on the result of `strlen` and (3) limits 
`MsgTaintedBufferSize` to reporting cases where the tainted value can be large. 
However, as long as we don't have this refined code, I think the best temporary 
solution is removing taint propagation from `strlen` as the untainted return 
value is a good practical approximation for a tainted value with strong upper 
bounds.

https://github.com/llvm/llvm-project/pull/66086
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to