Dnia June 28, 2020 3:00:00 AM UTC, Zac Medico <zmed...@gentoo.org> napisał(a):
>On 6/26/20 11:34 PM, Chun-Yu Shei wrote:
>> Hi,
>> 
>> I was recently interested in whether portage could be speed up, since
>> dependency resolution can sometimes take a while on slower machines.
>> After generating some flame graphs with cProfile and vmprof, I found
>3
>> functions which seem to be called extremely frequently with the same
>> arguments: catpkgsplit, use_reduce, and match_from_list.  In the
>first
>> two cases, it was simple to cache the results in dicts, while
>> match_from_list was a bit trickier, since it seems to be a
>requirement
>> that it return actual entries from the input "candidate_list".  I
>also
>> ran into some test failures if I did the caching after the
>> mydep.unevaluated_atom.use and mydep.repo checks towards the end of
>the
>> function, so the caching is only done up to just before that point.
>> 
>> The catpkgsplit change seems to definitely be safe, and I'm pretty
>sure
>> the use_reduce one is too, since anything that could possibly change
>the
>> result is hashed.  I'm a bit less certain about the match_from_list
>one,
>> although all tests are passing.
>> 
>> With all 3 patches together, "emerge -uDvpU --with-bdeps=y @world"
>> speeds up from 43.53 seconds to 30.96 sec -- a 40.6% speedup. 
>"emerge
>> -ep @world" is just a tiny bit faster, going from 18.69 to 18.22 sec
>> (2.5% improvement).  Since the upgrade case is far more common, this
>> would really help in daily use, and it shaves about 30 seconds off
>> the time you have to wait to get to the [Yes/No] prompt (from ~90s to
>> 60s) on my old Sandy Bridge laptop when performing normal upgrades.
>> 
>> Hopefully, at least some of these patches can be incorporated, and
>please
>> let me know if any changes are necessary.
>> 
>> Thanks,
>> Chun-Yu
>
>Using global variables for caches like these causes a form of memory
>leak for use cases involving long-running processes that need to work
>with many different repositories (and perhaps multiple versions of
>those
>repositories).
>
>There are at least a couple of different strategies that we can use to
>avoid this form of memory leak:
>
>1) Limit the scope of the caches so that they have some sort of garbage
>collection life cycle. For example, it would be natural for the
>depgraph
>class to have a local cache of use_reduce results, so that the cache
>can
>be garbage collected along with the depgraph.
>
>2) Eliminate redundant calls. For example, redundant calls to
>catpkgslit
>can be avoided by constructing more _pkg_str instances, since
>catpkgsplit is able to return early when its argument happens to be a
>_pkg_str instance.

I think the weak stuff from the standard library might also be helpful.

--
Best regards, 
Michał Górny

Reply via email to