[PATCH] D109632: [clang] de-duplicate methods from AST files

Duncan P. N. Exon Smith via Phabricator via cfe-commits Mon, 25 Oct 2021 11:14:04 -0700

dexonsmith added a comment.

In D109632#3081580 <https://reviews.llvm.org/D109632#3081580>, @vsapsai wrote:


> We are targeting the use case where in impl.m we have
>
>   @import ImmediateDep1;
>   @import ImmediateDep2;
>   ...
>   @import ImmediateDepN;
>
> and each of ImmediateDep has `@import SharedDep;`.
>
> For simplicity we'll consider that we are working with a single selector as 
> each of them is processed separately and independently. When clang encounters 
> a selector, it wants to collect methods corresponding to this selector from 
> all modules to a GlobalMethodPool. As in GlobalMethodPool  we store methods 
> in a list, adding a new method requires traversing this list. It is possible 
> to change the list to something else but this change isn't about that, in 
> `O(L*X)` complexity we are reducing `X` (which can be ~100000) and not `L` 
> (which is closer to ~1000).
>
> In the baseline METHOD_POOL for each "ImmediateDep" contains methods from 
> "SharedDep". When we add methods to the GlobalMethodPool, we try to add 
> methods from all "ImmediateDep". As the result we iterate through 
> GlobalMethodPool method list multiple time for each method from "SharedDep" 
> as they are available in each "ImmediateDep".
>
> Richard's idea is to put a DenseSet of encountered methods in front of 
> GlobalMethodPool method list. This way duplicates from "SharedDep" can be 
> detected and discarded quickly so we traverse a list only for each unique 
> method and not for duplicates.
>
> My idea is not to store any duplicates in METHOD_POOL. This is achieved by 
> each module storing only their own methods and not storing any [transitively] 
> imported methods. In turn, populating GlobalMethodPool requires traversing 
> the full dependency graph once and loading methods from METHOD_POOL in 
> corresponding modules compared to traversing only immediate dependencies as 
> in the baseline.

Thanks for the explanation; this was helpful.

What's the relative storage cost between the two proposals? I assume small or 
it would have been mentioned.

But another benefit of not double-storing transitively imported methods is that 
it makes the PCMs more independent, tacking slightly closer to 
"ImmediateDep1.pcm" being reproducible even when "SharedDep.pcm" adds a method 
to the global pool. This is a nice property if it's not too expensive. Looking 
at the numbers above, it doesn't look expensive; the relative performance for 
@rmaz's motivating use case seems pretty small.

@rmaz, will your goals be achieved by taking @vsapsai's approach? If so, I'm 
leaning slightly that way.

> I believe MultiOnDiskHashTable isn't really applicable to this performance 
> issue. Even if we use it to store methods, it doesn't affect the amount of 
> methods we process. And right now processing all the duplicates is the 
> expensive part.

Thanks for explaining.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109632/new/

https://reviews.llvm.org/D109632

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D109632: [clang] de-duplicate methods from AST files

Reply via email to