[PATCH] D148088: [RFC][clangd] Move preamble index task to a seperate task

Kadir Cetinkaya via Phabricator via cfe-commits Wed, 03 May 2023 00:42:23 -0700

kadircet added a comment.

Thanks for the patch, this is a topic we've discussed a couple times in the 
past as well as it has far reaching implications. I believe we're actually in a 
much better situation nowadays.


To summarise some concerns (thanks to @sammccall for useful discussion);

Features
--------

There are certain features that make use of index, ~all of them assume a 
"stale" index already, so this shouldn't effect them as severely.  
The most impacted feature will be code completion, as we only deserialize decls 
needed to parse the main file (until code completion point) from the preamble 
and rely on the index for the rest. So we'll have partial results until 
indexing finishes.  
I believe this is still better than what we've today. As in the presence of a 
global index, we'll still have some results but they'll be stale. Today we're 
not showing any results (well, to be more precise we're showing the identifier 
based ones) until preamble indexing finishes.
Another big concern that doesn't hold today is preamble-changed callbacks. In 
theory clangd notifies clients that a new preamble is available, in case they 
want "fresh" information. Today it's only used for providing semantic 
highlights, which doesn't use index. But in the future, if we let clients fetch 
other information based on this notification, we should make sure freshness of 
preamble index is not required.
There's also going to be some impact on documentations for code completion 
items/hover, but i don't think that's unacceptable either (we can already have 
that as we make use of stale preambles after the first build).

RAM/CPU usage due to extra concurrency
--------------------------------------

This means we'll be running more tasks in parallel, so clangd's peak memory 
usage will increase. Without this patch we've 1 live (not serialized) AST & 1 
serialized AST in memory at peak as all the other tasks would be blocked until 
live AST is destroyed. After this patch we'll also have some tasks that're 
working on the serialized AST. But this isn't something new either, after first 
preamble build we're already hitting this peak usage when using a stale 
preamble and building a new one.
As for extra concurrency, again this is pretty similar to what we do after the 
first preamble build, we're only doing more work if the preamble gets 
invalidated multiple times while indexing is running. Without this patch we 
would only build the "last" version of the preamble, but after this patch we 
can trigger some extra intermediate builds. Since preamble updates are rare, I 
don't think this one is a big concern either.

Safety concerns
---------------

Taking a closer look at the code, we're already triggering indexing callback 
after preamble serialization is complete. Hence it seems like there are no 
other users of the live AST and other structs needed for indexing. These 
structs are also part of compiler instance as ref-counted objects, hence 
keeping them alive in the indexing task (with a similar approach to Sam's 
patch) should hopefully be safe, as the code that resets/destroys the compiler 
instance isn't putting those into an invalid state (AFAICT).
Only remaining concern is whether clangd itself has any objects that it passes 
into preamble builds with sole-ownership and invalidates/destroys them after 
the build finishes. I don't think that's the case at hindsight either.
Hopefully most of these concerns can also be verified in practice by running 
using an ASAN build of clangd for a while.

Implementation concerns
-----------------------

As pointed out, we should store ref-counted objects properly and pass them to 
the preamble callbacks, so that they can be kept-alive by the indexing task. 
This implies some changes to the interface of IndexingCallbacks.
As for AsyncTaskRunner to use, since this indexing task only depends on the 
file-index, which is owned by ClangdServer, I don't think there's any need to 
introduce a new taskrunner into TUScheduler and block its destruction. We can 
just re-use the existing TaskRunner inside parsingcallbacks, in which we run 
stdlib indexing tasks.

Rollout strategy
----------------

As mentioned above I think landing the patch after some local testing with an 
ASAN, behind a ClangdServer option is fine. Afterwards we can test out the 
behaviour by gradually turning on this option in our production and once it 
looks safe we can turn it on by default (and just get rid of the option).
Does that strategy works for you folks too?

---

So all in all, I think the direction makes sense, we're trading off some 
"freshness" in certain places, but this tradeoff only effects the behaviour 
until first preamble and in return we get some considerable savings (for a 
large codebase preamble indexing takes ~10 secs @95th%, which is ~13% of 
preamble build latency).
LMK if you've any further concerns/questions and thanks for doing this!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D148088/new/

https://reviews.llvm.org/D148088

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D148088: [RFC][clangd] Move preamble index task to a seperate task

Reply via email to