NoQ created this revision. NoQ added reviewers: vsavchenko, xazax.hun, Szelethus, ASDenysPetrov. Herald added subscribers: manas, steakhal, martong, dkrupp, donat.nagy, arphaman, mikhail.ramalho, a.sidorin, rnkovacs, szepet, baloghadamsoftware. NoQ requested review of this revision.
Folks, I deleted some code. Some of it was... Perl code. Not sure if anybody noticed but scan-build didn't deduplicate reports correctly. Its algorithm was extremely primitive (basically it hashed each html file and deleted files with duplicate hash). This obviously didn't work with cross-file reports (i.e., bug path starts in the main file but ends in a header) which we intended to deduplicate by the end-of-path location, because the main file name is included in the report and therefore affects md5. D42269 <https://reviews.llvm.org/D42269> doesn't help either. Long story short, it didn't honor the modern advancements in `IssueHash` at all. I could make scan-build read the issue hash but I think my solution is even more elegant so hear me out. I put the issue hash into the report filename instead, replacing the random section. When clang tries to emit a duplicate report, it'll simply fail because the file with that name is already present. Moreover, as per tradition, I reduce the issue hash to the first 6 characters, so bug report filenames look exactly like before (`report-123abc.html`), except now they're automatically stable and deterministic! Such truncation is, of course, entirely cosmetic and absolutely unnecessary but I think it's pretty cool. The flag `-analyzer-config stable-report-filename=true` now becomes redundant because reports are stable anyway (in fact, they weren't actually stable before the patch even with the flag, because they depended on the race between clang invocations to emit the reports; I changed it to include a snippet of the issue hash as well instead of the race-y index). That said, I'm conflicted about removing the flag entirely because it also produces more verbose/readable filenames that people seem to enjoy. I think we should enable it by default instead, as soon as we make sure it doesn't produce extremely long filenames. `scan-build --generate-index-only` no longer does deduplication, as seen on the updated test "`rebuild_index`". If deduplication is desired, it can typically be achieved with a simple find -name '*.html' -exec mv "{}" . \; Repository: rC Clang https://reviews.llvm.org/D105167 Files: clang/lib/StaticAnalyzer/Core/HTMLDiagnostics.cpp clang/test/Analysis/scan-build/Inputs/deduplication/1.c clang/test/Analysis/scan-build/Inputs/deduplication/2.c clang/test/Analysis/scan-build/Inputs/deduplication/header.h clang/test/Analysis/scan-build/deduplication.test clang/test/Analysis/scan-build/rebuild_index/rebuild_index.test clang/test/Analysis/scan-build/rebuild_index/report-3.html clang/test/Analysis/scan-build/rebuild_index/subdirectory/report-3.html clang/test/Analysis/scan-build/rebuild_index/subdirectory/report-4.html clang/tools/scan-build/bin/scan-build
Index: clang/tools/scan-build/bin/scan-build =================================================================== --- clang/tools/scan-build/bin/scan-build +++ clang/tools/scan-build/bin/scan-build @@ -14,7 +14,6 @@ use strict; use warnings; use FindBin qw($RealBin); -use Digest::MD5; use File::Basename; use File::Find; use File::Copy qw(copy); @@ -268,27 +267,6 @@ $ENV{'CCC_ANALYZER_HTML'} = $Dir; } -##----------------------------------------------------------------------------## -# ComputeDigest - Compute a digest of the specified file. -##----------------------------------------------------------------------------## - -sub ComputeDigest { - my $FName = shift; - DieDiag("Cannot read $FName to compute Digest.\n") if (! -r $FName); - - # Use Digest::MD5. We don't have to be cryptographically secure. We're - # just looking for duplicate files that come from a non-malicious source. - # We use Digest::MD5 because it is a standard Perl module that should - # come bundled on most systems. - open(FILE, $FName) or DieDiag("Cannot open $FName when computing Digest.\n"); - binmode FILE; - my $Result = Digest::MD5->new->addfile(*FILE)->hexdigest; - close(FILE); - - # Return the digest. - return $Result; -} - ##----------------------------------------------------------------------------## # UpdatePrefix - Compute the common prefix of files. ##----------------------------------------------------------------------------## @@ -374,8 +352,6 @@ # Sometimes a source file is scanned more than once, and thus produces # multiple error reports. We use a cache to solve this problem. -my %AlreadyScanned; - sub ScanFile { my $Index = shift; @@ -383,19 +359,6 @@ my $FName = shift; my $Stats = shift; - # Compute a digest for the report file. Determine if we have already - # scanned a file that looks just like it. - - my $digest = ComputeDigest("$Dir/$FName"); - - if (defined $AlreadyScanned{$digest}) { - # Redundant file. Remove it. - unlink("$Dir/$FName"); - return; - } - - $AlreadyScanned{$digest} = 1; - # At this point the report file is not world readable. Make it happen. chmod(0644, "$Dir/$FName"); Index: clang/test/Analysis/scan-build/rebuild_index/subdirectory/report-4.html =================================================================== --- clang/test/Analysis/scan-build/rebuild_index/subdirectory/report-4.html +++ /dev/null @@ -1,8 +0,0 @@ -<!-- BUGTYPE bug4 --> -<!-- BUGFILE file4 --> -<!-- BUGPATHLENGTH 4 --> -<!-- BUGLINE 4 --> -<!-- BUGCATEGORY cat4 --> -<!-- BUGDESC desc4 --> -<!-- FUNCTIONNAME func4 --> -<!-- BUGMETAEND --> Index: clang/test/Analysis/scan-build/rebuild_index/subdirectory/report-3.html =================================================================== --- /dev/null +++ clang/test/Analysis/scan-build/rebuild_index/subdirectory/report-3.html @@ -0,0 +1,8 @@ +<!-- BUGTYPE bug3 --> +<!-- BUGFILE file3 --> +<!-- BUGPATHLENGTH 3 --> +<!-- BUGLINE 3 --> +<!-- BUGCATEGORY cat3 --> +<!-- BUGDESC desc3 --> +<!-- FUNCTIONNAME func3 --> +<!-- BUGMETAEND --> Index: clang/test/Analysis/scan-build/rebuild_index/report-3.html =================================================================== --- clang/test/Analysis/scan-build/rebuild_index/report-3.html +++ /dev/null @@ -1,8 +0,0 @@ -<!-- BUGTYPE bug1 --> -<!-- BUGFILE file1 --> -<!-- BUGPATHLENGTH 1 --> -<!-- BUGLINE 1 --> -<!-- BUGCATEGORY cat1 --> -<!-- BUGDESC desc1 --> -<!-- FUNCTIONNAME func1 --> -<!-- BUGMETAEND --> Index: clang/test/Analysis/scan-build/rebuild_index/rebuild_index.test =================================================================== --- clang/test/Analysis/scan-build/rebuild_index/rebuild_index.test +++ clang/test/Analysis/scan-build/rebuild_index/rebuild_index.test @@ -4,9 +4,8 @@ RUN: rm -rf %t.output_dir && mkdir %t.output_dir RUN: cp %S/report-1.html %t.output_dir RUN: cp %S/report-2.html %t.output_dir -RUN: cp %S/report-3.html %t.output_dir RUN: mkdir %t.output_dir/subdirectory -RUN: cp %S/subdirectory/report-4.html %t.output_dir/subdirectory +RUN: cp %S/subdirectory/report-3.html %t.output_dir/subdirectory RUN: %scan-build --generate-index-only %t.output_dir @@ -15,16 +14,13 @@ CHECK-FILES: index.html CHECK-FILES-NEXT: report-1.html CHECK-FILES-NEXT: report-2.html - -// report-3.html is a duplicate of report-1.html so it's not present. -CHECK-FILES-NOT: report-3.html CHECK-FILES-NEXT: scanview.css CHECK-FILES-NEXT: sorttable.js CHECK-FILES-NEXT: subdirectory RUN: ls %t.output_dir/subdirectory | FileCheck -check-prefix CHECK-SUB %s -CHECK-SUB: report-4.html +CHECK-SUB: report-3.html RUN: cat %t.output_dir/index.html | FileCheck -check-prefix CHECK-INDEX %s @@ -32,10 +28,9 @@ CHECK-INDEX-NEXT: bug1 CHECK-INDEX-NEXT: cat2 CHECK-INDEX-NEXT: bug2 -CHECK-INDEX-NEXT: cat4 -CHECK-INDEX-NEXT: bug4 +CHECK-INDEX-NEXT: cat3 +CHECK-INDEX-NEXT: bug3 CHECK-INDEX: report-1.html#EndPath CHECK-INDEX: report-2.html#EndPath -CHECK-INDEX-NOT: report-3.html#EndPath -CHECK-INDEX: subdirectory/report-4.html#EndPath +CHECK-INDEX: subdirectory/report-3.html#EndPath Index: clang/test/Analysis/scan-build/deduplication.test =================================================================== --- /dev/null +++ clang/test/Analysis/scan-build/deduplication.test @@ -0,0 +1,32 @@ +// FIXME: Actually, "perl". +REQUIRES: shell + +RUN: rm -rf %t.output_dir && mkdir %t.output_dir +RUN: %scan-build -o %t.output_dir \ +RUN: -analyzer-config stable-report-filename=false \ +RUN: %clang -S %S/Inputs/deduplication/1.c \ +RUN: %S/Inputs/deduplication/2.c \ +RUN: | FileCheck %s -check-prefix CHECK-STDOUT + +RUN: ls %t.output_dir/*/ | FileCheck %s -check-prefix CHECK-FILENAMES + +RUN: rm -rf %t.output_dir && mkdir %t.output_dir +RUN: %scan-build -o %t.output_dir \ +RUN: -analyzer-config stable-report-filename=true \ +RUN: %clang -S %S/Inputs/deduplication/1.c \ +RUN: %S/Inputs/deduplication/2.c \ +RUN: | FileCheck %s -check-prefix CHECK-STDOUT + +RUN: ls %t.output_dir/*/ | FileCheck %s -check-prefix CHECK-FILENAMES + +// Only one report file should be generated. + +CHECK-STDOUT: scan-build: Using '{{.*}}' for static analysis +CHECK-STDOUT: scan-build: 1 bug found. +CHECK-STDOUT: scan-build: Run 'scan-view {{.*}}' to examine bug reports. + + +CHECK-FILENAMES: index.html +CHECK-FILENAMES-NEXT: report-{{.*}}.html +CHECK-FILENAMES-NEXT: scanview.css +CHECK-FILENAMES-NEXT: sorttable.js Index: clang/test/Analysis/scan-build/Inputs/deduplication/header.h =================================================================== --- /dev/null +++ clang/test/Analysis/scan-build/Inputs/deduplication/header.h @@ -0,0 +1,4 @@ +int foo() { + int x = 0; + return 1 / x; +} Index: clang/test/Analysis/scan-build/Inputs/deduplication/2.c =================================================================== --- /dev/null +++ clang/test/Analysis/scan-build/Inputs/deduplication/2.c @@ -0,0 +1,5 @@ +#include "header.h" + +void bar() { + foo(); +} Index: clang/test/Analysis/scan-build/Inputs/deduplication/1.c =================================================================== --- /dev/null +++ clang/test/Analysis/scan-build/Inputs/deduplication/1.c @@ -0,0 +1,5 @@ +#include "header.h" + +void bar() { + foo(); +} Index: clang/lib/StaticAnalyzer/Core/HTMLDiagnostics.cpp =================================================================== --- clang/lib/StaticAnalyzer/Core/HTMLDiagnostics.cpp +++ clang/lib/StaticAnalyzer/Core/HTMLDiagnostics.cpp @@ -208,6 +208,18 @@ ReportDiag(*Diag, filesMade); } +static llvm::SmallString<32> getIssueHash(const PathDiagnostic &D, + const Preprocessor &PP) { + SourceManager &SMgr = PP.getSourceManager(); + PathDiagnosticLocation UPDLoc = D.getUniqueingLoc(); + FullSourceLoc L(SMgr.getExpansionLoc(UPDLoc.isValid() + ? UPDLoc.asLocation() + : D.getLocation().asLocation()), + SMgr); + return getIssueHash(L, D.getCheckerName(), D.getBugType(), + D.getDeclWithIssue(), PP.getLangOpts()); +} + void HTMLDiagnostics::ReportDiag(const PathDiagnostic& D, FilesMade *filesMade) { // Create the HTML directory if it is missing. @@ -234,11 +246,6 @@ // Create a new rewriter to generate HTML. Rewriter R(const_cast<SourceManager&>(SMgr), PP.getLangOpts()); - // The file for the first path element is considered the main report file, it - // will usually be equivalent to SMgr.getMainFileID(); however, it might be a - // header when -analyzer-opt-analyze-headers is used. - FileID ReportFile = path.front()->getLocation().asLocation().getExpansionLoc().getFileID(); - // Get the function/method name SmallString<128> declName("unknown"); int offsetDecl = 0; @@ -265,46 +272,52 @@ // Create a path for the target HTML file. int FD; - SmallString<128> Model, ResultPath; - - if (!DiagOpts.ShouldWriteStableReportFilename) { - llvm::sys::path::append(Model, Directory, "report-%%%%%%.html"); - if (std::error_code EC = - llvm::sys::fs::make_absolute(Model)) { - llvm::errs() << "warning: could not make '" << Model - << "' absolute: " << EC.message() << '\n'; - return; - } - if (std::error_code EC = llvm::sys::fs::createUniqueFile( - Model, FD, ResultPath, llvm::sys::fs::OF_Text)) { - llvm::errs() << "warning: could not create file in '" << Directory - << "': " << EC.message() << '\n'; - return; - } - } else { - int i = 1; - std::error_code EC; - do { - // Find a filename which is not already used - const FileEntry* Entry = SMgr.getFileEntryForID(ReportFile); - std::stringstream filename; - Model = ""; - filename << "report-" - << llvm::sys::path::filename(Entry->getName()).str() - << "-" << declName.c_str() - << "-" << offsetDecl - << "-" << i << ".html"; - llvm::sys::path::append(Model, Directory, - filename.str()); - EC = llvm::sys::fs::openFileForReadWrite( - Model, FD, llvm::sys::fs::CD_CreateNew, llvm::sys::fs::OF_None); - if (EC && EC != llvm::errc::file_exists) { - llvm::errs() << "warning: could not create file '" << Model - << "': " << EC.message() << '\n'; - return; - } - i++; - } while (EC); + SmallString<128> ResultPath; + + std::stringstream filename; + filename << "report-"; + + // Historically, neither the stable report filename nor the unstable report + // filename were actually stable. That said, the stable report filename + // was more stable because it was mostly composed of information + // about the bug report instead of being completely random. + // Now both stable and unstable report filenames are in fact stable + // but the stable report filename is still more verbose. + // We should rename the option ("verbose" filename?) but it will break + // people's workflows. + if (DiagOpts.ShouldWriteStableReportFilename) { + // FIXME: This code relies on knowing what constitutes the issue hash. + // Otherwise deduplication won't work correctly. + FileID ReportFile = + path.back()->getLocation().asLocation().getExpansionLoc().getFileID(); + + const FileEntry* Entry = SMgr.getFileEntryForID(ReportFile); + + filename << llvm::sys::path::filename(Entry->getName()).str() << "-" + << declName.c_str() << "-" << offsetDecl << "-"; + } + + filename << StringRef(getIssueHash(D, PP)).substr(0, 6).str() << ".html"; + llvm::sys::path::append(ResultPath, Directory, filename.str()); + if (std::error_code EC = llvm::sys::fs::make_absolute(ResultPath)) { + llvm::errs() << "warning: could not make '" << ResultPath + << "' absolute: " << EC.message() << '\n'; + return; + } + + if (std::error_code EC = llvm::sys::fs::openFileForReadWrite( + ResultPath, FD, llvm::sys::fs::CD_CreateNew, + llvm::sys::fs::OF_None)) { + // Existence of the file corresponds to the situation where a different + // Clang instance has emitted a bug report with the same issue hash. + // This is an entirely normal situation that does not deserve a warning, + // as apart from hash collisions this can happen because the reports + // are in fact similar enough to be considered duplicates of each other. + if (EC != llvm::errc::file_exists) { + llvm::errs() << "warning: could not create file in '" << Directory + << "': " << EC.message() << '\n'; + } + return; } llvm::raw_fd_ostream os(FD, true); @@ -591,7 +604,6 @@ ? UPDLoc.asLocation() : D.getLocation().asLocation()), SMgr); - const Decl *DeclWithIssue = D.getDeclWithIssue(); StringRef BugCategory = D.getCategory(); if (!BugCategory.empty()) @@ -603,9 +615,7 @@ os << "\n<!-- FUNCTIONNAME " << declName << " -->\n"; - os << "\n<!-- ISSUEHASHCONTENTOFLINEINCONTEXT " - << getIssueHash(L, D.getCheckerName(), D.getBugType(), DeclWithIssue, - PP.getLangOpts()) + os << "\n<!-- ISSUEHASHCONTENTOFLINEINCONTEXT " << getIssueHash(D, PP) << " -->\n"; os << "\n<!-- BUGLINE "
_______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits