jansvoboda11 added a comment. With the call to `llvm::sys::path::native` scoped only to `IgnoredFiles`, would this patch LGTY?
================ Comment at: clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp:161-162 + const StringRef RawFilename) { + llvm::SmallString<256> Filename; + llvm::sys::path::native(RawFilename, Filename); + ---------------- dexonsmith wrote: > jansvoboda11 wrote: > > dexonsmith wrote: > > > I'm a bit nervous about the impact of modifying the input filename on > > > Windows before passing it into other APIs. This could change behaviour of > > > lower layers of the VFS (since they'll see a different filename than when > > > DependencyScanningWOrkerFileSystem is NOT on top of them). > > > > > > Can we restrict this just to what's passed to IgnoredFiles? (Maybe add > > > `shouldIgnore()` API, which returns `false` if the set is empty, and then > > > locally converts to native and checks for membership...) > > > > > > It also seems wasteful to be calling `sys::path::native` and the memcpy > > > all the time, when usually it has no effect. Have you checked whether > > > this affects performance of scanning something big? > > Yeah, I can see that path changing between VFS layers can be problematic. > > I'm pretty sure we can get away with only converting `Filename` to its > > native form when interacting with `IgnoredFiles`. > > > > I haven't checked the performance impact. If it ends up being measurable, I > > could implement something like `sys::path::is_native` and avoid the copy > > most of the time on unix-like OSes. WDYT? > Probably it'll end up not being measurable, but if it is, something like > `is_native` might help... that said, if this will eventually be replaced with > logic relyin on fs::UniqueID it might not be worth optimizing. Agreed. ================ Comment at: clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp:171-172 bool KeepOriginalSource = IgnoredFiles.count(Filename) || !shouldMinimize(Filename); DependencyScanningFilesystemSharedCache::SharedFileSystemEntry ---------------- dexonsmith wrote: > jansvoboda11 wrote: > > dexonsmith wrote: > > > Looking at this, makes me wonder if this is just fixing a specific > > > instance of a more general problem. > > > > > > Maybe `IgnoredFiles` should be a set of `FileEntry`s instead of > > > `StringRef`s... but that'd create a different performance bottleneck when > > > the set is big, since creating the FileEntrys would be expensive. We'd > > > want the FileEntry lookup to be globally cached / etc. -- and FileManager > > > isn't quite safe to use globally. > > > > > > Do you think IgnoredFiles as-is will work well enough for where it'll be > > > used for PCH? Or do we need to catch headers referenced in two different > > > ways somehow? > > I think we could use `llvm::sys::fs::UniqueID` instead of the filename to > > refer to files. Since the VFS layer resolves symlinks when stat-ing a file, > > that should be a canonical file identifier. I can tackle that in a follow > > up patch. > Yup, a unique ID should work for a file identifier. > > I'm concerned about the cost of looking up the unique ID — avoiding stat > traffic was measured to be an important performance benefit in the dependency > scanner model. > > To avoid a perf regression, I think you could use caches like: > - ids: filename -> unique-id > - originals: unique-id -> original file content > - minimized: unique-id -> minimized file content > > Where "ids" and "originals" are read/cached in lock-step when accessing a > filename, additionally computing "minimized" if not in the ignore-list. > (Adding a file to the ignore-list would put content in "ids" and "originals".) > > The goal is to amortize the `stat` cost across the lifetime of the service > while ensuring a consistent view of the file content. > > WDYT? > > ... regardless I think all of this is out of scope for the current patch, > which is still useful for unblocking adding tests to the subsequent patches > in the stack. Yes, this is the cache structure I had in mind. I agree that this should be tackled in a follow-up patch. I'm going to create a patch with xfailing test case that demonstrates how one file with two different names (e.g. symlink) can cause issues with the current approach. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D106064/new/ https://reviews.llvm.org/D106064 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits