On 6/27/20 8:12 PM, Michał Górny wrote: > Dnia June 28, 2020 3:00:00 AM UTC, Zac Medico <zmed...@gentoo.org> napisał(a): >> On 6/26/20 11:34 PM, Chun-Yu Shei wrote: >>> Hi, >>> >>> I was recently interested in whether portage could be speed up, since >>> dependency resolution can sometimes take a while on slower machines. >>> After generating some flame graphs with cProfile and vmprof, I found >> 3 >>> functions which seem to be called extremely frequently with the same >>> arguments: catpkgsplit, use_reduce, and match_from_list. In the >> first >>> two cases, it was simple to cache the results in dicts, while >>> match_from_list was a bit trickier, since it seems to be a >> requirement >>> that it return actual entries from the input "candidate_list". I >> also >>> ran into some test failures if I did the caching after the >>> mydep.unevaluated_atom.use and mydep.repo checks towards the end of >> the >>> function, so the caching is only done up to just before that point. >>> >>> The catpkgsplit change seems to definitely be safe, and I'm pretty >> sure >>> the use_reduce one is too, since anything that could possibly change >> the >>> result is hashed. I'm a bit less certain about the match_from_list >> one, >>> although all tests are passing. >>> >>> With all 3 patches together, "emerge -uDvpU --with-bdeps=y @world" >>> speeds up from 43.53 seconds to 30.96 sec -- a 40.6% speedup. >> "emerge >>> -ep @world" is just a tiny bit faster, going from 18.69 to 18.22 sec >>> (2.5% improvement). Since the upgrade case is far more common, this >>> would really help in daily use, and it shaves about 30 seconds off >>> the time you have to wait to get to the [Yes/No] prompt (from ~90s to >>> 60s) on my old Sandy Bridge laptop when performing normal upgrades. >>> >>> Hopefully, at least some of these patches can be incorporated, and >> please >>> let me know if any changes are necessary. >>> >>> Thanks, >>> Chun-Yu >> >> Using global variables for caches like these causes a form of memory >> leak for use cases involving long-running processes that need to work >> with many different repositories (and perhaps multiple versions of >> those >> repositories). >> >> There are at least a couple of different strategies that we can use to >> avoid this form of memory leak: >> >> 1) Limit the scope of the caches so that they have some sort of garbage >> collection life cycle. For example, it would be natural for the >> depgraph >> class to have a local cache of use_reduce results, so that the cache >> can >> be garbage collected along with the depgraph. >> >> 2) Eliminate redundant calls. For example, redundant calls to >> catpkgslit >> can be avoided by constructing more _pkg_str instances, since >> catpkgsplit is able to return early when its argument happens to be a >> _pkg_str instance. > > I think the weak stuff from the standard library might also be helpful. > > -- > Best regards, > Michał Górny >
Hmm, maybe weak global caches are an option? -- Thanks, Zac
signature.asc
Description: OpenPGP digital signature