Hi zaks.anna, jordan_rose, krememek, danielmarjamaki,
Hi,
We would like to ask for your comments and opinion (RFC) a patch
that introduces a unique hash identifier for defects reported by Clang Static
Analyzer. This unique identifier is generated for every report into the PLIST
file output of Clang SA in the form:
“<key>bug_id_1</key><string>MD5_HASH</string>”.
We work on a report database & viewer for Clang SA, which we also presented on
the EuroLLVM 2015 conf in London.
http://llvm.org/devmtg/2015-04/slides/Clang_static_analysis_toolset_final.pdf
Unique hash identifiers are useful to implement the following use-cases:
1) list new defects compared to a baseline. This is called “differential
view”.
2) Being able to permanently suppress false positive reports.
Design goals for the bug identifier:
a) The identifier must be unique. So there must not exist two different
defects identified by the same ID.
b) The identifier must not change if there are changes in the code not
affecting the reported fault. Such changes are: inserting empty lines, or
adding unrelated code before or after the position of the report. Therefore
line number cannot be included in the identifier.
We propose to use a hash function to generate the unique identifier.
Since line number cannot be used we must utilize semantic information as the
source of this hash.
The suggested patch generates an MD5 hash based on the following information:
1. File name
• So a bug in a copy pasted function with the same signature has
a different id.
2. Content of the line where the bug is
• So if anything changes in the close environment of the bug, it
changes the ID. We think that it is likely that the changes in the same line
will semantically affect the bug.
3. Position (column) within the line
• To be able to differentiate between bugs within the same line
reported by the same checker
4. Unique name of the checker
• So that we are able to differentiate between reported faults
for the same position, but generated by different checkers
5. Signature of the enclosing functiondecl, type declaration or namespace
• Due to overloaded functions and copy pasted implementations, it
is likely that the same fault is found in two different overloaded functions.
These reports must have different IDs, thus we take the signature of the
enclosing function into consideration. If the bug position is not within the
scope of a function, we use the fully qualified name of the enclosing scope
(type name or namespace). The global namespace is represented by an empty
string.
6. Optional Extra field
• There may be cases when a checker would like to report multiple
problems for the same position. In this case the checker writer can add a
differentiator field in the checker implementation.
In the current code there exists a similar identifier generator to the one
suggested above. That implementation takes into consideration only
• the name of the enclosing scope
• and the relative line number within the enclosing scope.
This source of information is insufficient as the base of the hash for the
reasons described above.
We included a version of the hash in the name of the key (<key>bug_id_1</key>)
in the PLIST output to identify the hash generator algorithm. This way it will
be possible to introduce a new hash calculation algorithm if needed.
http://reviews.llvm.org/D10305
Files:
include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h
include/clang/StaticAnalyzer/Core/BugReporter/PathDiagnostic.h
lib/StaticAnalyzer/Core/BugReporter.cpp
lib/StaticAnalyzer/Core/PathDiagnostic.cpp
lib/StaticAnalyzer/Core/PlistDiagnostics.cpp
EMAIL PREFERENCES
http://reviews.llvm.org/settings/panel/emailpreferences/
Index: include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h
===================================================================
--- include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h
+++ include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h
@@ -75,6 +75,7 @@
const Decl *DeclWithIssue;
std::string ShortDescription;
std::string Description;
+ std::string HashField;
PathDiagnosticLocation Location;
PathDiagnosticLocation UniqueingLocation;
const Decl *UniqueingDecl;
@@ -143,19 +144,23 @@
void popInterestingSymbolsAndRegions();
public:
- BugReport(BugType& bt, StringRef desc, const ExplodedNode *errornode)
- : BT(bt), DeclWithIssue(nullptr), Description(desc), ErrorNode(errornode),
- ConfigurationChangeToken(0), DoNotPrunePath(false) {}
-
- BugReport(BugType& bt, StringRef shortDesc, StringRef desc,
- const ExplodedNode *errornode)
- : BT(bt), DeclWithIssue(nullptr), ShortDescription(shortDesc),
- Description(desc), ErrorNode(errornode), ConfigurationChangeToken(0),
- DoNotPrunePath(false) {}
-
- BugReport(BugType &bt, StringRef desc, PathDiagnosticLocation l)
- : BT(bt), DeclWithIssue(nullptr), Description(desc), Location(l),
- ErrorNode(nullptr), ConfigurationChangeToken(0), DoNotPrunePath(false) {}
+ BugReport(BugType &bt, StringRef desc, const ExplodedNode *errornode,
+ StringRef hashfield = "")
+ : BT(bt), DeclWithIssue(nullptr), Description(desc), HashField(hashfield),
+ ErrorNode(errornode), ConfigurationChangeToken(0),
+ DoNotPrunePath(false) {}
+
+ BugReport(BugType &bt, StringRef shortDesc, StringRef desc,
+ const ExplodedNode *errornode, StringRef hashfield = "")
+ : BT(bt), DeclWithIssue(nullptr), ShortDescription(shortDesc),
+ Description(desc), HashField(hashfield), ErrorNode(errornode),
+ ConfigurationChangeToken(0), DoNotPrunePath(false) {}
+
+ BugReport(BugType &bt, StringRef desc, PathDiagnosticLocation l,
+ StringRef hashfield = "")
+ : BT(bt), DeclWithIssue(nullptr), Description(desc), HashField(hashfield),
+ Location(l), ErrorNode(nullptr), ConfigurationChangeToken(0),
+ DoNotPrunePath(false) {}
/// \brief Create a BugReport with a custom uniqueing location.
///
@@ -165,8 +170,9 @@
/// for uniquing reports. For example, memory leaks checker, could set this to
/// the allocation site, rather then the location where the bug is reported.
BugReport(BugType& bt, StringRef desc, const ExplodedNode *errornode,
- PathDiagnosticLocation LocationToUnique, const Decl *DeclToUnique)
- : BT(bt), DeclWithIssue(nullptr), Description(desc),
+ PathDiagnosticLocation LocationToUnique, const Decl *DeclToUnique,
+ StringRef hashfield = "")
+ : BT(bt), DeclWithIssue(nullptr), Description(desc), HashField(hashfield),
UniqueingLocation(LocationToUnique),
UniqueingDecl(DeclToUnique),
ErrorNode(errornode), ConfigurationChangeToken(0),
@@ -181,6 +187,8 @@
StringRef getDescription() const { return Description; }
+ StringRef getHashField() const { return HashField; }
+
StringRef getShortDescription(bool UseFallback = true) const {
if (ShortDescription.empty() && UseFallback)
return Description;
Index: include/clang/StaticAnalyzer/Core/BugReporter/PathDiagnostic.h
===================================================================
--- include/clang/StaticAnalyzer/Core/BugReporter/PathDiagnostic.h
+++ include/clang/StaticAnalyzer/Core/BugReporter/PathDiagnostic.h
@@ -720,6 +720,7 @@
std::string BugType;
std::string VerboseDesc;
std::string ShortDesc;
+ std::string HashField;
std::string Category;
std::deque<std::string> OtherDesc;
@@ -739,7 +740,7 @@
PathDiagnostic(StringRef CheckName, const Decl *DeclWithIssue,
StringRef bugtype, StringRef verboseDesc, StringRef shortDesc,
StringRef category, PathDiagnosticLocation LocationToUnique,
- const Decl *DeclToUnique);
+ const Decl *DeclToUnique, StringRef hashfield);
~PathDiagnostic();
@@ -794,6 +795,7 @@
StringRef getShortDescription() const {
return ShortDesc.empty() ? VerboseDesc : ShortDesc;
}
+ StringRef getHashField() const { return HashField; }
StringRef getCheckName() const { return CheckName; }
StringRef getBugType() const { return BugType; }
StringRef getCategory() const { return Category; }
Index: lib/StaticAnalyzer/Core/BugReporter.cpp
===================================================================
--- lib/StaticAnalyzer/Core/BugReporter.cpp
+++ lib/StaticAnalyzer/Core/BugReporter.cpp
@@ -3403,7 +3403,8 @@
exampleReport->getDescription(),
exampleReport->getShortDescription(/*Fallback=*/false), BT.getCategory(),
exampleReport->getUniqueingLocation(),
- exampleReport->getUniqueingDecl()));
+ exampleReport->getUniqueingDecl(),
+ exampleReport->getHashField()));
MaxBugClassSize = std::max(bugReports.size(),
static_cast<size_t>(MaxBugClassSize));
Index: lib/StaticAnalyzer/Core/PathDiagnostic.cpp
===================================================================
--- lib/StaticAnalyzer/Core/PathDiagnostic.cpp
+++ lib/StaticAnalyzer/Core/PathDiagnostic.cpp
@@ -111,12 +111,13 @@
StringRef bugtype, StringRef verboseDesc,
StringRef shortDesc, StringRef category,
PathDiagnosticLocation LocationToUnique,
- const Decl *DeclToUnique)
+ const Decl *DeclToUnique, StringRef hashfield)
: CheckName(CheckName),
DeclWithIssue(declWithIssue),
BugType(StripTrailingDots(bugtype)),
VerboseDesc(StripTrailingDots(verboseDesc)),
ShortDesc(StripTrailingDots(shortDesc)),
+ HashField(hashfield),
Category(StripTrailingDots(category)),
UniqueingLoc(LocationToUnique),
UniqueingDecl(DeclToUnique),
Index: lib/StaticAnalyzer/Core/PlistDiagnostics.cpp
===================================================================
--- lib/StaticAnalyzer/Core/PlistDiagnostics.cpp
+++ lib/StaticAnalyzer/Core/PlistDiagnostics.cpp
@@ -11,17 +11,25 @@
//
//===----------------------------------------------------------------------===//
-#include "clang/StaticAnalyzer/Core/AnalyzerOptions.h"
+#include "clang/AST/Decl.h"
+#include "clang/AST/DeclCXX.h"
#include "clang/Basic/FileManager.h"
#include "clang/Basic/PlistSupport.h"
#include "clang/Basic/SourceManager.h"
#include "clang/Basic/Version.h"
+#include "clang/Basic/Specifiers.h"
+#include "clang/Lex/Lexer.h"
#include "clang/Lex/Preprocessor.h"
+#include "clang/StaticAnalyzer/Core/AnalyzerOptions.h"
#include "clang/StaticAnalyzer/Core/BugReporter/PathDiagnostic.h"
#include "clang/StaticAnalyzer/Core/PathDiagnosticConsumers.h"
+#include "llvm/ADT/Twine.h"
#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/SmallVector.h"
-#include "llvm/Support/Casting.h"
+#include "llvm/Support/Casting.h"
+#include "llvm/Support/LineIterator.h"
+#include "llvm/Support/Path.h"
+#include "llvm/Support/MD5.h"
using namespace clang;
using namespace ento;
using namespace markup;
@@ -285,6 +293,151 @@
}
}
+static std::string GetSignature(const clang::FunctionDecl *Target) {
+ if (!Target)
+ return "";
+
+ std::string Signature;
+ const CXXConstructorDecl *CDecl = dyn_cast<CXXConstructorDecl>(Target);
+ const CXXDestructorDecl *DDecl = dyn_cast<CXXDestructorDecl>(Target);
+ const CXXConversionDecl *ConversionDecl = dyn_cast<CXXConversionDecl>(Target);
+
+ switch (Target->getStorageClass()) {
+ case SC_Extern:
+ Signature.append("extern ");
+ break;
+ case SC_Static:
+ Signature.append("static ");
+ break;
+ case SC_PrivateExtern:
+ Signature.append("__private_extern__ ");
+ break;
+ default:
+ break;
+ }
+
+ if (Target->isInlineSpecified())
+ Signature.append("inline ");
+ if (Target->isVirtualAsWritten())
+ Signature.append("virtual ");
+ if (Target->isModulePrivate())
+ Signature.append("__module_private__ ");
+ if (Target->isConstexpr() && !Target->isExplicitlyDefaulted())
+ Signature.append("constexpr ");
+ if ((CDecl && CDecl->isExplicitSpecified()) ||
+ (ConversionDecl && ConversionDecl->isExplicit()))
+ Signature.append("explicit ");
+
+ if (!CDecl && !ConversionDecl && !DDecl)
+ Signature.append(Target->getReturnType().getAsString()).append(" ");
+ Signature.append(Target->getQualifiedNameAsString()).append("(");
+
+ const FunctionProtoType *TargetPT =
+ llvm::dyn_cast_or_null<FunctionProtoType>(Target->getType().getTypePtr());
+
+ for (int i = 0, paramsCount = Target->getNumParams(); i < paramsCount; ++i) {
+ if (i)
+ Signature.append(", ");
+ Signature.append(Target->getParamDecl(i)->getType().getAsString());
+ }
+
+ if (TargetPT && TargetPT->isVariadic())
+ Signature.append(", ...");
+ Signature.append(")");
+
+ if (TargetPT) {
+ if (TargetPT->isConst())
+ Signature.append(" const");
+ if (TargetPT->isVolatile())
+ Signature.append(" volatile");
+ if (TargetPT->isRestrict())
+ Signature.append(" restrict");
+ }
+
+ switch (TargetPT->getRefQualifier()) {
+ case RQ_LValue:
+ Signature.append(" &");
+ break;
+ case RQ_RValue:
+ Signature.append(" &&");
+ break;
+ default:
+ break;
+ }
+
+ return Signature;
+}
+
+static std::string GetEnclosingDeclContextSignature(const Decl *D) {
+ if (!D)
+ return "";
+
+ if (const NamedDecl *ND = dyn_cast<NamedDecl>(D)) {
+ std::string DeclName;
+
+ switch (ND->getKind()) {
+ case Decl::Namespace:
+ case Decl::Record:
+ case Decl::CXXRecord:
+ case Decl::Enum:
+ DeclName = ND->getQualifiedNameAsString();
+ break;
+ case Decl::CXXConstructor:
+ case Decl::CXXDestructor:
+ case Decl::CXXConversion:
+ case Decl::CXXMethod:
+ case Decl::ObjCMethod:
+ case Decl::Function:
+ DeclName = GetSignature(dyn_cast_or_null<FunctionDecl>(ND));
+ break;
+ default:
+ break;
+ }
+
+ return DeclName;
+ }
+
+ return "";
+}
+
+static std::string GetNthLineOfFile(llvm::MemoryBuffer *Buffer, int Line) {
+ if (!Buffer)
+ return "";
+
+ llvm::line_iterator LI(*Buffer, false);
+ for (; !LI.is_at_eof() && LI.line_number() != Line; ++LI)
+ ;
+
+ return LI->str();
+}
+
+static llvm::SmallString<32> GetHashOfContent(StringRef Content) {
+ llvm::MD5 Hash;
+ llvm::MD5::MD5Result MD5Res;
+ llvm::SmallString<32> Res;
+
+ Hash.update(Content);
+ Hash.final(MD5Res);
+ llvm::MD5::stringifyResult(MD5Res, Res);
+
+ return Res;
+}
+
+static llvm::SmallString<32> GetIssueHash(const SourceManager *SM,
+ FullSourceLoc &L,
+ StringRef CheckerName,
+ StringRef HashField, const Decl *D) {
+ static const std::string Delimeter = "$";
+
+ return GetHashOfContent(
+ (llvm::Twine(llvm::sys::path::filename(SM->getFilename(L))) + Delimeter +
+ CheckerName + Delimeter + GetEnclosingDeclContextSignature(D) +
+ Delimeter + std::to_string(L.getExpansionColumnNumber()) + Delimeter +
+ GetNthLineOfFile(SM->getBuffer(L.getFileID(), L), L.getExpansionLineNumber()) +
+ Delimeter + HashField.str())
+ .str());
+}
+
void PlistDiagnostics::FlushDiagnosticsImpl(
std::vector<const PathDiagnostic *> &Diags,
FilesMade *filesMade) {
@@ -390,6 +543,12 @@
o << " <key>check_name</key>";
EmitString(o, D->getCheckName()) << '\n';
+ o << " <key>bug_id_1</key>";
+ FullSourceLoc L(SM->getExpansionLoc(D->getLocation().asLocation()), *SM);
+ const Decl *DeclWithIssue = D->getDeclWithIssue();
+ EmitString(o, GetIssueHash(SM, L, D->getCheckName(), D->getHashField(),
+ DeclWithIssue).str()) << '\n';
+
// Output information about the semantic context where
// the issue occurred.
if (const Decl *DeclWithIssue = D->getDeclWithIssue()) {
_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits