dexonsmith accepted this revision. dexonsmith added a comment. This revision is now accepted and ready to land.
In D106064#2887753 <https://reviews.llvm.org/D106064#2887753>, @jansvoboda11 wrote: > With the call to `llvm::sys::path::native` scoped only to `IgnoredFiles`, > would this patch LGTY? Yes, this LGTM once you update that. ================ Comment at: clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp:171-172 bool KeepOriginalSource = IgnoredFiles.count(Filename) || !shouldMinimize(Filename); DependencyScanningFilesystemSharedCache::SharedFileSystemEntry ---------------- jansvoboda11 wrote: > dexonsmith wrote: > > jansvoboda11 wrote: > > > dexonsmith wrote: > > > > Looking at this, makes me wonder if this is just fixing a specific > > > > instance of a more general problem. > > > > > > > > Maybe `IgnoredFiles` should be a set of `FileEntry`s instead of > > > > `StringRef`s... but that'd create a different performance bottleneck > > > > when the set is big, since creating the FileEntrys would be expensive. > > > > We'd want the FileEntry lookup to be globally cached / etc. -- and > > > > FileManager isn't quite safe to use globally. > > > > > > > > Do you think IgnoredFiles as-is will work well enough for where it'll > > > > be used for PCH? Or do we need to catch headers referenced in two > > > > different ways somehow? > > > I think we could use `llvm::sys::fs::UniqueID` instead of the filename to > > > refer to files. Since the VFS layer resolves symlinks when stat-ing a > > > file, that should be a canonical file identifier. I can tackle that in a > > > follow up patch. > > Yup, a unique ID should work for a file identifier. > > > > I'm concerned about the cost of looking up the unique ID — avoiding stat > > traffic was measured to be an important performance benefit in the > > dependency scanner model. > > > > To avoid a perf regression, I think you could use caches like: > > - ids: filename -> unique-id > > - originals: unique-id -> original file content > > - minimized: unique-id -> minimized file content > > > > Where "ids" and "originals" are read/cached in lock-step when accessing a > > filename, additionally computing "minimized" if not in the ignore-list. > > (Adding a file to the ignore-list would put content in "ids" and > > "originals".) > > > > The goal is to amortize the `stat` cost across the lifetime of the service > > while ensuring a consistent view of the file content. > > > > WDYT? > > > > ... regardless I think all of this is out of scope for the current patch, > > which is still useful for unblocking adding tests to the subsequent patches > > in the stack. > Yes, this is the cache structure I had in mind. > > I agree that this should be tackled in a follow-up patch. I'm going to create > a patch with xfailing test case that demonstrates how one file with two > different names (e.g. symlink) can cause issues with the current approach. Might be nice to include that `XFAIL`'ed test in this patch, as well as a FIXME in the code, documenting the general problem. But if you'd rather land that separately/after it's fine with me. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D106064/new/ https://reviews.llvm.org/D106064 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits