sammccall added a comment.
Herald added a project: All.

Time to wake up an old review!

Going to put high-level thoughts on 
https://github.com/clangd/clangd/discussions/1206, but since one of them is 
memory usage I wanted to have a look at the concrete data structure here.

TL;DR: I think we can make this cheap enough we don't care about the extra 
memory. The simple lookup optimization below might be enough (probably not). 
Dropping non-callee refs from the data structure is likely enough, the deep 
lookup optimization is likely enough, doing both probably **reduces** memory 
usage vs today!

---

Lookup optimization (simple):

`DenseMap<SymbolID, vector<RevRef>>` is something like `56*containers + 
16*refs` bytes:

- a `pair<SymbolID, vector<RevRef>, which is 4 pointers per symbol, plus extra 
buckets so >7 pointers if I'm reading DenseMap right
- a `RevRef` per entry which is two pointers
- maybe some allocator overhead for all those little vectors? (IDK how 
allocators work in practice)

Dropping the hashtable and just storing a flat `vector<RevRef>` sorted by 
container eliminates the per-symbol costs for `16*refs` bytes. If there were 
only 3-4 refs per function this would save 50%, but there's probably more. The 
tradeoff is binary search (with double-indirection for compare) instead of hash 
lookup, but still seems fast enough for what we want it for.

The elephant in the room is that most of these refs we're storing are not 
callees. If we were to drop those from the lookup structure as discussed inline 
above, then the ratio of refs:symbols goes down and this optimization becomes 
relatively more effective.

---

Lookup optimization (deeper): h/t @kadircet

I think it's easiest to explain as a pair of modifications to the current refs 
data structures (as checked-in, not the ones in this patch).

**First**: imagine we change RefSlab to produce:

- a big flat `vector<Ref>` grouped by target.
- a lookup structure consisting of a sorted `vector<pair</*Target*/SymbolID, 
/*Offset*/int>>`

This is smaller than our current representation: the lookup structure is `20 * 
targets` bytes instead of the current `~40 * targets` bytes and making the ref 
storage contiguous is ~free.

Clearly this supports lookup of refs for a target by binary searching over the 
lookup structure by target.

But observe that also given a `Ref*` we can find its target: binary search over 
the lookup structure searching for the right **offset**, rather than target.

**Second**: have the index compute a permutation of refs grouped by Container. 
Naively this is a `vector<Ref*>`, but since we made the array flat it can be 
`vector<int>`

Now we can look up the refs for a given container by binary searching within 
this permutation, as the container is stored in the Refs themselves.

For each result we have to do a second binary search in the lookup structure to 
get the target symbol, but we expect the #symbols per container to be small so 
this nesting isn't a great concern.

So the net cost vs HEAD is `4 * refs - 20 * targets` bytes, which is less than 
a quarter of the version above. This takes us to ~2% increase which we can eat 
even if we store this info for all references.

The costs here are expensive reverse lookups (I don't think we care), slightly 
more expensive forward lookups (also OK I think), code complexity (*mostly* 
localized). Obviously this is all somewhat invasive...

(We discussed an even further squeeze: we could use the same trick to avoid our 
current 8 bytes inside the ref for the container. I think this leads to too 
many levels of nested binary searches)



================
Comment at: clang-tools-extra/clangd/XRefs.cpp:1843
+    auto Kind = Callee.SymInfo.Kind;
+    if (Kind != SK::Function && Kind != SK::InstanceMethod &&
+        Kind != SK::ClassMethod && Kind != SK::StaticMethod &&
----------------
qchateau wrote:
> nridge wrote:
> > If memory usage of `Dex::RevRefs` becomes a concern, we could consider only 
> > storing the references that would pass this filter in the first place. That 
> > would trade off time spent building the reverse lookup (we'd have to do 
> > symbol lookups or keep around extra state to track the symbol kinds) for 
> > space savings.
> I've used tried this patch with the whole llvm project indexed. If I remember 
> correctly it's something in the like of 100MB, around 5-10% même usage of 
> clangd. As such it's not very high but it's definitely not negligible, 
> especially for a feature that's not used very often.
> That would trade off time spent building the reverse lookup (we'd have to do 
> symbol lookups or keep around extra state to track the symbol kinds) for 
> space savings.

RefKind is a bitfield, so I think we could just tack on an extra `Call` bit at 
indexing time. Technically it's extra state, but it's cheap to compute and free 
to plumb around.

Then the tradeoff becomes: the index needs to know which refkinds to include in 
its reverse-lookup structure. In practical terms we could hard-code this, it's 
an ugly layering violation though. The prettiest version I can think of is to 
put a constant on `RefersToRequest` for the supported refkinds.

(A ridiculous answer that almost works is to build it dynamically on first use, 
based on the queried refkinds: this means only people who use outgoing call 
hierarchy would pay the memory cost! Unfortunately I think this gets too 
complicated when we consider threadsafety)


================
Comment at: clang-tools-extra/clangd/index/Index.h:133
+  virtual bool
+  refersTo(const RefsRequest &Req,
+           llvm::function_ref<void(const RefersToResult &)> Callback) const = 
0;
----------------
qchateau wrote:
> nridge wrote:
> > Perhaps we should have a distinct request structure, for future 
> > extensibility?
> Sure, that makes sense
+1, also for clarity (we're changing the meaning of `IDs` too).

I'd also consider naming this request something more explicitly paired with 
`refs`, like `containedRefs`, `reverseRefs`, `outgoingRefs`. They're not great 
names in isolation, but I think the idea that this is a second query on the 
same data is worth emphasizing.


================
Comment at: clang-tools-extra/clangd/index/MemIndex.cpp:97
+      Req.Limit.getValueOr(std::numeric_limits<uint32_t>::max());
+  for (const auto &Pair : Refs) {
+    for (const auto &R : Pair.second) {
----------------
qchateau wrote:
> nridge wrote:
> > Looping over all refs seems expensive even for `MemIndex`. Perhaps we 
> > should have a reverse lookup table similar to `RevRefs` here as well?
> That's pretty fast even on my laptop IIRC, and only used for outgoing calls 
> atm. 
MemIndex basically doesn't scale: it does fuzzyfind by iterating over all 
symbols. So this sounds likely never to matter in practice.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93829/new/

https://reviews.llvm.org/D93829

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to