dexonsmith added a comment. In D109632#3081580 <https://reviews.llvm.org/D109632#3081580>, @vsapsai wrote:
> We are targeting the use case where in impl.m we have > > @import ImmediateDep1; > @import ImmediateDep2; > ... > @import ImmediateDepN; > > and each of ImmediateDep has `@import SharedDep;`. > > For simplicity we'll consider that we are working with a single selector as > each of them is processed separately and independently. When clang encounters > a selector, it wants to collect methods corresponding to this selector from > all modules to a GlobalMethodPool. As in GlobalMethodPool we store methods > in a list, adding a new method requires traversing this list. It is possible > to change the list to something else but this change isn't about that, in > `O(L*X)` complexity we are reducing `X` (which can be ~100000) and not `L` > (which is closer to ~1000). > > In the baseline METHOD_POOL for each "ImmediateDep" contains methods from > "SharedDep". When we add methods to the GlobalMethodPool, we try to add > methods from all "ImmediateDep". As the result we iterate through > GlobalMethodPool method list multiple time for each method from "SharedDep" > as they are available in each "ImmediateDep". > > Richard's idea is to put a DenseSet of encountered methods in front of > GlobalMethodPool method list. This way duplicates from "SharedDep" can be > detected and discarded quickly so we traverse a list only for each unique > method and not for duplicates. > > My idea is not to store any duplicates in METHOD_POOL. This is achieved by > each module storing only their own methods and not storing any [transitively] > imported methods. In turn, populating GlobalMethodPool requires traversing > the full dependency graph once and loading methods from METHOD_POOL in > corresponding modules compared to traversing only immediate dependencies as > in the baseline. Thanks for the explanation; this was helpful. What's the relative storage cost between the two proposals? I assume small or it would have been mentioned. But another benefit of not double-storing transitively imported methods is that it makes the PCMs more independent, tacking slightly closer to "ImmediateDep1.pcm" being reproducible even when "SharedDep.pcm" adds a method to the global pool. This is a nice property if it's not too expensive. Looking at the numbers above, it doesn't look expensive; the relative performance for @rmaz's motivating use case seems pretty small. @rmaz, will your goals be achieved by taking @vsapsai's approach? If so, I'm leaning slightly that way. > I believe MultiOnDiskHashTable isn't really applicable to this performance > issue. Even if we use it to store methods, it doesn't affect the amount of > methods we process. And right now processing all the duplicates is the > expensive part. Thanks for explaining. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D109632/new/ https://reviews.llvm.org/D109632 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits