[PATCH] D109701: [clang] Emit SARIF Diagnostics: Create `clang::SarifDocumentWriter` interface

Aaron Ballman via Phabricator via cfe-commits Wed, 29 Jun 2022 11:10:29 -0700

aaron.ballman added a comment.

Thanks, I think this is heading in the right direction!




================
Comment at: clang/include/clang/Basic/Sarif.h:167
+
+  ThreadFlow setRange(const CharSourceRange &ItemRange) {
+    Range = ItemRange;
----------------
Should we assert this source range is not a token range?


================
Comment at: clang/include/clang/Basic/Sarif.h:280
+  SarifResult setLocations(llvm::ArrayRef<CharSourceRange> DiagLocs) {
+    Locations.assign(DiagLocs.begin(), DiagLocs.end());
+    return *this;
----------------
In an asserts build, should we additionally have a loop to assert that each 
location is a char range rather than a token range?


================
Comment at: clang/lib/Basic/Sarif.cpp:161
+    Region["endColumn"] = adjustColumnPos(
+        R.getEnd(), Lexer::MeasureTokenLength(R.getEnd().getLocWithOffset(0),
+                                              R.getEnd().getManager(), LO));
----------------
vaibhav.y wrote:
> aaron.ballman wrote:
> > vaibhav.y wrote:
> > > aaron.ballman wrote:
> > > > vaibhav.y wrote:
> > > > > aaron.ballman wrote:
> > > > > > vaibhav.y wrote:
> > > > > > > aaron.ballman wrote:
> > > > > > > > I didn't catch this during the review -- but this is a layering 
> > > > > > > > violation that caused link errors on some of the build bots. 
> > > > > > > > Lexer can call into Basic, but Basic cannot call into Lexer. So 
> > > > > > > > we'll need to find a different way to handle this.
> > > > > > > Would moving the code to Support, having it depend on Basic & Lex 
> > > > > > > work?
> > > > > > I don't think so -- Support is supposed to be a pretty low-level 
> > > > > > interface; it currently only relies on LLVM's Support library. I 
> > > > > > think the layering is supposed to be: Support -> Basic -> Lex.
> > > > > > 
> > > > > > As I see it, there are a couple of options as to where this could 
> > > > > > live. It could live in the Frontend library, as that's where all 
> > > > > > the diagnostic consumer code in Clang lives. But that library might 
> > > > > > be a bit "heavy" to pull into other tools (maybe? I don't know). It 
> > > > > > could also live in AST -- that already links in Basic and Lex. But 
> > > > > > that feels like a somewhat random place for this to live as this 
> > > > > > has very little to do with the AST itself.
> > > > > > 
> > > > > > Another approach, which might be better, is to require the user of 
> > > > > > this interface to pass in the token length calculation themselves 
> > > > > > in the places where it's necessary. e.g., `json::Object 
> > > > > > whatever(SourceLocation Start, SourceLocation End, unsigned 
> > > > > > EndLen)` and then you can remove the reliance on the lexer entirely 
> > > > > > while keeping the interface in Basic. I'm not certain how obnoxious 
> > > > > > this suggestion is, but I think it's my preferred approach for the 
> > > > > > moment (but not a strongly held position yet). WDYT of this 
> > > > > > approach?
> > > > > I think the approach to injecting the function is better here. I've 
> > > > > tried to make the smallest change possiblew with passing in a 
> > > > > function whose interface is almost identical to 
> > > > > `Lexer::MeasureTokenLength`. The intent was to hint at this being the 
> > > > > "canonical metric" for token lengths (with an example in the tests 
> > > > > for the same).
> > > > > 
> > > > > I tried passing start, end locs but couldn't find a strong use case 
> > > > > yet since `end` would likely always be: 
> > > > > `Lexer::getLocForEndOfToken(start, 0)`
> > > > I'm not convinced that the less obtrusive change is a good design in 
> > > > this case. But I also agree that we should not use start/end 
> > > > *locations* either. `SourceLocation` traditionally points to the 
> > > > *start* of a token, so it would be super easy to get the `end` location 
> > > > wrong by forgetting to pass the location for the end of the token.
> > > > 
> > > > My suggestion was to continue to pass the start of the starting token, 
> > > > the start of the ending token, and the length of the ending token. With 
> > > > the callback approach, you have to call through the callback to 
> > > > eventually call `Lexer::MeasureTokenLength()`; with the direct 
> > > > approach, you skip needing to call through a callback (which means at 
> > > > least one less function call on every source location operation).
> > > Ah, I think I misunderstood your initial suggestion (`json::Object 
> > > whatever(SourceLocation Start, SourceLocation End, unsigned EndLen)`) 
> > > seemed like a function call to me, when it seems the suggested change was 
> > > to pass in an object? Apologies, will fix that up.
> > Sorry for the confusion! Just to make sure we're on the same page -- my 
> > suggestion was to change the function interfaces like 
> > `SarficDocumentWriter::createPhysicalLocation()` so that they would take an 
> > additional `unsigned EndLen` parameter.
> > 
> > However, now that I dig around a bit, it seems like `CharSourceRange` is 
> > what you'd want to use there -- then you can assert that what you're given 
> > is a char range and not a token range. So you won't need the `unsigned 
> > EndLen` parameter after all!
> Interesting!
> 
> Asking for my understanding: If a `CharSourceRange` is a valid character 
> range,  then the `End` SLoc points to the last character in the range (even 
> if it is mid token)? (Unlike slocs where it the first character of the last 
> token).
Your understanding is correct.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109701/new/

https://reviews.llvm.org/D109701

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D109701: [clang] Emit SARIF Diagnostics: Create `clang::SarifDocumentWriter` interface

Reply via email to