Dnia June 28, 2020 3:00:00 AM UTC, Zac Medico <zmed...@gentoo.org> napisał(a): >On 6/26/20 11:34 PM, Chun-Yu Shei wrote: >> Hi, >> >> I was recently interested in whether portage could be speed up, since >> dependency resolution can sometimes take a while on slower machines. >> After generating some flame graphs with cProfile and vmprof, I found >3 >> functions which seem to be called extremely frequently with the same >> arguments: catpkgsplit, use_reduce, and match_from_list. In the >first >> two cases, it was simple to cache the results in dicts, while >> match_from_list was a bit trickier, since it seems to be a >requirement >> that it return actual entries from the input "candidate_list". I >also >> ran into some test failures if I did the caching after the >> mydep.unevaluated_atom.use and mydep.repo checks towards the end of >the >> function, so the caching is only done up to just before that point. >> >> The catpkgsplit change seems to definitely be safe, and I'm pretty >sure >> the use_reduce one is too, since anything that could possibly change >the >> result is hashed. I'm a bit less certain about the match_from_list >one, >> although all tests are passing. >> >> With all 3 patches together, "emerge -uDvpU --with-bdeps=y @world" >> speeds up from 43.53 seconds to 30.96 sec -- a 40.6% speedup. >"emerge >> -ep @world" is just a tiny bit faster, going from 18.69 to 18.22 sec >> (2.5% improvement). Since the upgrade case is far more common, this >> would really help in daily use, and it shaves about 30 seconds off >> the time you have to wait to get to the [Yes/No] prompt (from ~90s to >> 60s) on my old Sandy Bridge laptop when performing normal upgrades. >> >> Hopefully, at least some of these patches can be incorporated, and >please >> let me know if any changes are necessary. >> >> Thanks, >> Chun-Yu > >Using global variables for caches like these causes a form of memory >leak for use cases involving long-running processes that need to work >with many different repositories (and perhaps multiple versions of >those >repositories). > >There are at least a couple of different strategies that we can use to >avoid this form of memory leak: > >1) Limit the scope of the caches so that they have some sort of garbage >collection life cycle. For example, it would be natural for the >depgraph >class to have a local cache of use_reduce results, so that the cache >can >be garbage collected along with the depgraph. > >2) Eliminate redundant calls. For example, redundant calls to >catpkgslit >can be avoided by constructing more _pkg_str instances, since >catpkgsplit is able to return early when its argument happens to be a >_pkg_str instance.
I think the weak stuff from the standard library might also be helpful. -- Best regards, Michał Górny