whisperity added a comment.
@aaron.ballman I've gone over LLVM (and a few other projects). Some general
observations:
- Length of `2` **is vile**. I understand that the C++CG rule says even lengths
of 2 should be matched, but that is industrially infeasible unless one
introduces such a rule incrementally to their project. Findings of length 2 are
**, in general,** an order of magnitude more than... basically the rest of the
findings.
- On big projects, even the current "default" of "length `3`" seems to be too
low. In reality, one should consider (this is likely to be out of scope for
Tidy) how often these functions are called, and various other metrics on how
serious an offender is.
- LLVM: **7206** len-2 findings, 767 len-3, 194 len-4, 59 len-5, etc.)
- OpenCV: **4391** len-2, 987 len-3, 507, 115, ...
- I did not manually evaluate OpenCV, and I think OpenCV is a project on
which - due to their seeming design principle of type erasure - this check is
useless.
- (Similar could be said about LLVM, we are cutting a lot of corners on
"niceness" for absolute bleeding-edge performance!)
- Most of the findings come from low-level types, such as `bool` or `int` or
`unsigned` or some project or library-specific (such as a few cases of `SDVal`
in LLVM) types.
- Strings and string-like types (such as `StringRef`, `char *`, etc.) are also
huge offenders.
- Some of these offenders can be side-stepped with semantic typedefs (D66148
<https://reviews.llvm.org/D66148>)
- Further heuristics are definitely needed, but this patch is huge enough as
is, I'd introduce further heuristics in future patches:
- Allow the user to configure the set of ignored types and variable names
from the command-line (I think this already exists as a //FIXME// in the code
as of now)
- Relatedness check for parameter names (more or less like in D20689
<https://reviews.llvm.org/D20689>!) //if// we can fetch parameter names -- I am
thinking about trying to parse the variable names to find if they have a common
prefix and number.
- If the parameters have a //sufficient// (question is, what is sufficient)
common ancestor in the AST (like as @zporky mentioned, assignment, comparison,
etc., but also should consider passing them to the same function -- this would
ignore forwarding and factory methods nicely on the first run --), they should
be silenced.
- Forwarding and trampoline functions I've put into the //False-positive//
category, as they will be ignored once such a heuristic is complete.
- This contributed a huge chunk of the false-positive ratio. Due to the
trampolines and factories, and "forwarding overloads", it's around 40% on
Xerces and LLVM. Without it, around 15%.
- This more or less follows naturally, one offending function is one true
positive, but every trampoline, pimple, overload, etc. on it is basically an
additional false positive.
- Making decisions on code you're not familiar with is hard ;)
This is most likely not possible purely from Tidy as this requires the wrangler
tooling support, but I believe people who wish to use this check should be
encouraged to **only** take the introduced differences
<http://codechecker.readthedocs.io/en/latest/jenkins_gerrit_integration/#execute-shell>
into consideration on their project.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D69560/new/
https://reviews.llvm.org/D69560
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits