[PATCH] D120306: [clangd] IncludeCleaner: Add support for IWYU pragma private

Sam McCall via Phabricator via cfe-commits Tue, 22 Feb 2022 06:16:03 -0800

sammccall added a comment.

In D120306#3337490 <https://reviews.llvm.org/D120306#3337490>, @kbobyrev wrote:

> In D120306#3337374 <https://reviews.llvm.org/D120306#3337374>, @sammccall 
> wrote:
>
>> In D120306#3337288 <https://reviews.llvm.org/D120306#3337288>, @kbobyrev 
>> wrote:
>>
>>> In D120306#3337212 <https://reviews.llvm.org/D120306#3337212>, @sammccall 
>>> wrote:
>>>
>>>> What's the policy this patch intends to implement?
>>>>
>>>> I'm a little concerned by building up maps of filenames with segment 
>>>> limits - it looks like some kind of heuristic/partial matching.
>>>
>>> Basically, handling the mappings that. IWYU pragmas provide.
>>
>> Yes, I'm asking you to describe that handling in a couple of sentences so I 
>> can understand what you intend it to be.
>>
>> e.g. for includecleaner in general, the approach is: walk the AST, map each 
>> node to decls it uses, map decls to locations, map locations to files, then 
>> check which includes point to files that were not identified.
>
> I'm trying to reuse the information about `IWYU pragma private` that is 
> collected in `Canoncalncludes` to find the header responsible for including 
> the private ones. Canonicalncludes stores information about the headers with 
> `IWYU pragma: prviate, include <XYZ>`.

This makes sense, though note that CanonicalIncludes predates IncludeStructure 
and had a pretty narrow purpose, it _might_ make more sense to record/store the 
data elsewhere, we should consider all options.

> When we're figuring out the responsible header in IncludeCleaner, I'm 
> checking if the header has this pragma and then I'm trying to find the 
> `<XYZ>` header that is mentioned in the pragma comment.
> For that I'm using a heuristic that the include stack of the main file should 
> have a header whose suffix is <XYZ>.

So all we *know* is that the header that should be included can be spelled 
`<XYZ>`.
We don't know that it's on the include stack for the header defining the 
symbol, but we also don't know that it's included at all! (This is similar to 
the stdlib case).

It's true that for "unused include" warning we can ignore it if not included.
Still, it seems like a bad sign that we're trying to model headers referenced 
in this was as FileIDs rather than e.g. strings.
(And it will be harder to reuse for "missing include" warnings).

> This because we never really have the full header name in the pragma comment 
> an?d
> it might not be visible from the private header (and hence can not be 
> resolved from there)

Do you have an example? For include insertion purposes, we assume the header 
name can be included verbatim.
That implies it should be resolvable against the include path of any TU that 
sees the private header. (That said, performing IO might have its own problems)

> An alternative would probably be:
>
> - Walk #include directives and try to find the "umbrella headers"
> - For each include directive, query the included file and figure out if it 
> has "IWYU pragma: private, include <XYZ>" through `CanonicalIncludes`
> - If it does, figure out if the supposed umbrella header name has the `<XYZ>` 
> suffix, in that case attribute the private header to this public one
>
> This all would need to be done after `Canonicalncludes` are complete, meaning 
> this would probably belong to `IncludeStructure` instead. Maybe this is more 
> precise, WDYT?

I don't really understand this alternative, because "try to find the umbrella 
headers" seems to be begging the question.
What is an umbrella header apart from one referenced by an IWYU private pragma, 
and how would we identify them?

>> - whether we're comparing filenames among a small (e.g. the include stack - 
>> heuristics OK) or a large set (need to be very precise)
>
> Yes, I'm using just the include stack of the main file and the preamble. When 
> you say "large set", do you mean the whole set of project headers?

Small set = include stack = chain of includes from main file -> private header. 
(But public header may not be on that stack).
Large set = include graph = all preamble headers.

> The problem here is that for the include stack we would miss the right 
> diagnostic if the public header responsible for the private one is not 
> accessible. Maybe we should throw an "unused" warning either way here, I 
> don't know what would be right.
>
>> - whether the partial matching is semantically important or a performance 
>> optimitzation
>
> The matching itself is an optimisation, right: what I'm dealing with is that 
> `gtest.h` header is called 
> `/home/user/llvm-project/llvm/util/googletest/.../gtest/gtest.h` but the 
> private headers will only say `include "gtest/gtest.h"`. What I'm trying to 
> do is find the header whose real path name ends with `gtest/gtest.h`.

There's a few issues here:

- there might be multiple `"gtest/gtest.h"` (`#include_next`) though typically 
only in the stdlib
- ignoring the include path might lead to an incorrect match (`private; include 
"config.h"` should not match `<llvm/config.h>`)
- "real" path name *sounds* nice, but may not be the right one

I think the thing we're really trying to express is: `#include "gtest/gtest.h"` 
is not unused.
We can loosen that to match `#include <gtest/gtest.h>`, `#include 
"path/gtest/gtest.h"` etc if we want, but ideally we'd do that explicitly 
rather than as a side-effect of the implementation.

Again, I think the clearest way to express this intent is simply to bubble the 
string "gtest/gtest.h" all the way up to the "which inclusions are unused" 
check.

>> - whether the concept of "real name" is significant or likely to cause 
>> problems
>
> Hm, I don't know; what kind of problems do you think might appear?
>
>> - whether storing another copy of the names of all the transitive headers is 
>> actually necessary for what you're trying to do
>
> The problem here is that I'll need `FileID` in `headerResponsbile` ut I can't 
> reuse `FileID`s between preamble and main file, right?

When we'd discussed this in the past, we'd talked about doing this in HeaderID 
space instead of FileID space.
headerResponsible() is the right API/point in the code for non-include-guarded 
headers, but may not be the right fit for pragmas, even if it means a bit of 
unfortunate duplication.

(Again, I think strings are probably better than HeaderIDs )

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120306/new/

https://reviews.llvm.org/D120306

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D120306: [clangd] IncludeCleaner: Add support for IWYU pragma private

Reply via email to