sammccall added a comment.

TL;DR: Cool! I think we should complicate the signature of `hasFile()` a bit to 
avoid calling SwapIndex::snapshot() for each file. There's also some quadratic 
behavior due to calling index ops from MergeIndex when children are 
MergeIndexes, but N is small and I think we can solve it later.

---

I like the simplicity/elegance of this solution, and the fact that it avoids 
the scaling pitfall of trying to return a list of all the indexed files.

I am concerned about performance though. At a high level, we try to make index 
operations fairly coarse/batched and only do a handful of them. This was 
originally to allow these operations to be RPCs, but of course this assumption 
gets baked in in other ways too. Introducing a fine-grained operation like this 
causes some of the assumptions to be violated.

Concretely, SwapIndex (and ProjectAwareIndex) take a lock to obtain the index 
snapshot to use, and so here we do that for each call to `hasFile()`. Most 
indexes are wrapped in SwapIndex for either lazy-loading or dynamic updating.

And we hammer it reasonably hard here, MergeIndex::refs() will call it in a 
loop for ~100 times, and because there are multiple MergeIndexes in the index 
stack, we end up with duplication: the first index never gets hasFile(), the 
second gets it 100 times, the third 200 times, the fourth 300 times. (Currently 
we have at least 4 indexes in the stack).

Taking 600 locks to call refs() seems... less than ideal. For now they'll 
probably be uncontended as we only call refs() for interactive actions 
(find-references and rename), but fundamentally this applies to all requests, 
not just refs, so I'm not sure this will be true in the future (a fast 
pseudoparser index will want to consult the previous index to resolve symbols).

This all seems related to the idea that `hasFile()` isn't implemented for 
RemoteIndex - it's a pragmatic shortcut, but in principle it'd be nice to be 
able to implement it, and there's some minimum per-RPC overhead.

---

So, can we do anything about all this?

Locking once for each ref: fundamentally we want to lock once to get the 
snapshot, then query the snapshot for each file. This changing hasFile to 
return an oracle that can be cheaply queried, e.g. a signature like 
`unique_function<bool(StringRef)> SymbolIndex::indexedFiles()`.

The quadratic nature of calling any index ops in MergedIndex: I guess probably 
not without flattening our binary-tree/linked-list of MergedIndex into an N-way 
merge. Back in the day, we basically had one static index which was huge, and 
one dynamic index which was tiny, and the no-copy aspect of MergedIndex was 
appealing (and it was easy to implement!). But we're probably not coming out 
ahead anymore vs simply committing to copy everything once. The N-way merge 
would have other advantages too: we could query large/slow indexes in parallel 
to improve latency.

Implementing a pseudo-batch form of hasFiles as an RPC for remote-index: we'd 
need to give up on getting answers for files one-by-one. That would be OK for 
an N-way merge (since we'd fetch all results before merging, we could build a 
list). But servers typically index *everything* under a certain directory, so 
fetching the list of directories and evaluating per-file queries client-side is 
a cheap and reasonable alternative (that's more accurate than `return false`!)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93393/new/

https://reviews.llvm.org/D93393

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to