DonatNagyE wrote: > By looking at the disappeared reports you attached, I'm convinced that the > `MsgTaintedBufferSize` diagnostics give little to no benefit in general. On > the other side, I've seen good hits for OOBV2 in the presence of taint - even > if that's rarely the case. On the theory side, I also believe that > propagation should happen on `strlen` and similar functions.
You're right that theoretically speaking the `strlen`-like functions propagate taint, but their return value is practically always constrained from above (if the string has a null terminator, the length is less than its extent in memory) and I think that propagating taint without considering these constraint is worse than not propagating taint. By the way, could you show an OOBV2 true positive that involves taint propagated by `strlen` call? My impression was that the "index is tainted" errors that involve `strlen` are typically false positives where the analyzer cannot deduce that `s[strlen(s)]` is valid and e.g. `s[strlen(s)-1]` is also valid when the string is nonempty. > Consequently, I agree with the raised problems, but I disagree with the > approach. I would rather remove the `MsgTaintedBufferSize` diagnostic to > resolve those FPs. Alternatively, we can also think of creating a heuristic > to reduce such FPs. For e.g. check if the most significant bit of the > allocation size is proven to be unset (aka. checked some meaningful > upperbounds) and suppress reports in that case, but report otherwise. Would > it be okay with you to proceed by not removing taint propagation? I agree that `MsgTaintedBufferSize` reports should be limited to cases where the tainted value _can be large_ and I would suggest a threshold that's significantly lower than the SIZE_MAX/2 implied by your "most significant bit" suggestion (ideally the threshold value would be configurable). However, I think this eliminates the false positives related to `strlen` only if we also add some logic that can reliably "tag" the return value of `strlen` with reasonable upper bounds. I'd be happy to accept a patch that (1) ensures that `strlen` propagates taint (2) reliably puts upper bounds on the result of `strlen` and (3) limits `MsgTaintedBufferSize` to reporting cases where the tainted value can be large. However, as long as we don't have this refined code, I think the best temporary solution is removing taint propagation from `strlen` as the untainted return value is a good practical approximation for a tainted value with strong upper bounds. https://github.com/llvm/llvm-project/pull/66086 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits