[PATCH] D106064: [clang][deps] Normalize paths in minimizing file system

Duncan P. N. Exon Smith via Phabricator via cfe-commits Mon, 19 Jul 2021 11:50:12 -0700

dexonsmith accepted this revision.
dexonsmith added a comment.
This revision is now accepted and ready to land.


In D106064#2887753 <https://reviews.llvm.org/D106064#2887753>, @jansvoboda11 
wrote:

> With the call to `llvm::sys::path::native` scoped only to `IgnoredFiles`, 
> would this patch LGTY?

Yes, this LGTM once you update that.



================
Comment at: 
clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp:171-172
 
   bool KeepOriginalSource = IgnoredFiles.count(Filename) ||
                             !shouldMinimize(Filename);
   DependencyScanningFilesystemSharedCache::SharedFileSystemEntry
----------------
jansvoboda11 wrote:
> dexonsmith wrote:
> > jansvoboda11 wrote:
> > > dexonsmith wrote:
> > > > Looking at this, makes me wonder if this is just fixing a specific 
> > > > instance of a more general problem.
> > > > 
> > > > Maybe `IgnoredFiles` should be a set of `FileEntry`s instead of 
> > > > `StringRef`s... but that'd create a different performance bottleneck 
> > > > when the set is big, since creating the FileEntrys would be expensive. 
> > > > We'd want the FileEntry lookup to be globally cached / etc. -- and 
> > > > FileManager isn't quite safe to use globally.
> > > > 
> > > > Do you think IgnoredFiles as-is will work well enough for where it'll 
> > > > be used for PCH? Or do we need to catch headers referenced in two 
> > > > different ways somehow?
> > > I think we could use `llvm::sys::fs::UniqueID` instead of the filename to 
> > > refer to files. Since the VFS layer resolves symlinks when stat-ing a 
> > > file, that should be a canonical file identifier. I can tackle that in a 
> > > follow up patch.
> > Yup, a unique ID should work for a file identifier.
> > 
> > I'm concerned about the cost of looking up the unique ID — avoiding stat 
> > traffic was measured to be an important performance benefit in the 
> > dependency scanner model.
> > 
> > To avoid a perf regression, I think you could use caches like:
> > - ids: filename -> unique-id
> > - originals: unique-id -> original file content
> > - minimized: unique-id -> minimized file content
> > 
> > Where "ids" and "originals" are read/cached in lock-step when accessing a 
> > filename, additionally computing "minimized" if not in the ignore-list. 
> > (Adding a file to the ignore-list would put content in "ids" and 
> > "originals".)
> > 
> > The goal is to amortize the `stat` cost across the lifetime of the service 
> > while ensuring a consistent view of the file content.
> > 
> > WDYT?
> > 
> > ... regardless I think all of this is out of scope for the current patch, 
> > which is still useful for unblocking adding tests to the subsequent patches 
> > in the stack.
> Yes, this is the cache structure I had in mind.
> 
> I agree that this should be tackled in a follow-up patch. I'm going to create 
> a patch with xfailing test case that demonstrates how one file with two 
> different names (e.g. symlink) can cause issues with the current approach.
Might be nice to include that `XFAIL`'ed test in this patch, as well as a FIXME 
in the code, documenting the general problem. But if you'd rather land that 
separately/after it's fine with me.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106064/new/

https://reviews.llvm.org/D106064

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D106064: [clang][deps] Normalize paths in minimizing file system

Reply via email to