Re: [gentoo-portage-dev] [PATCH 1/3] Add caching to catpkgsplit function

Alec Warner Thu, 09 Jul 2020 14:14:23 -0700

On Thu, Jul 9, 2020 at 2:06 PM Chun-Yu Shei <cs...@google.com> wrote:


> Hmm, that's strange... it seems to have made it to the list archives:
> https://archives.gentoo.org/gentoo-portage-dev/message/a4db905a64e3c1f6d88c4876e8291a65
>
> (but it is entirely possible that I used "git send-email" incorrectly)
>

Ahhh it's visible there; I'll blame gMail ;)

-A


>
> On Thu, Jul 9, 2020 at 2:04 PM Alec Warner <anta...@gentoo.org> wrote:
>
>>
>>
>> On Thu, Jul 9, 2020 at 12:03 AM Chun-Yu Shei <cs...@google.com> wrote:
>>
>>> Awesome!  Here's a patch that adds @lru_cache to use_reduce, vercmp, and
>>> catpkgsplit.  use_reduce was split into 2 functions, with the outer one
>>> converting lists/sets to tuples so they can be hashed and creating a
>>> copy of the returned list (since the caller seems to modify it
>>> sometimes).  I tried to select cache sizes that minimized memory use
>>> increase,
>>> while still providing about the same speedup compared to a cache with
>>> unbounded size. "emerge -uDvpU --with-bdeps=y @world" runtime decreases
>>> from 44.32s -> 29.94s -- a 48% speedup, while the maximum value of the
>>> RES column in htop increases from 280 MB -> 290 MB.
>>>
>>> "emerge -ep @world" time slightly decreases from 18.77s -> 17.93, while
>>> max observed RES value actually decreases from 228 MB -> 214 MB (similar
>>> values observed across a few before/after runs).
>>>
>>> Here are the cache hit stats, max observed RES memory, and runtime in
>>> seconds  for various sizes in the update case.  Caching for each
>>> function was tested independently (only 1 function with caching enabled
>>> at a time):
>>>
>>> catpkgsplit:
>>> CacheInfo(hits=1222233, misses=21419, maxsize=None, currsize=21419)
>>> 270 MB
>>> 39.217
>>>
>>> CacheInfo(hits=1218900, misses=24905, maxsize=10000, currsize=10000)
>>> 271 MB
>>> 39.112
>>>
>>> CacheInfo(hits=1212675, misses=31022, maxsize=5000, currsize=5000)
>>> 271 MB
>>> 39.217
>>>
>>> CacheInfo(hits=1207879, misses=35878, maxsize=2500, currsize=2500)
>>> 269 MB
>>> 39.438
>>>
>>> CacheInfo(hits=1199402, misses=44250, maxsize=1000, currsize=1000)
>>> 271 MB
>>> 39.348
>>>
>>> CacheInfo(hits=1149150, misses=94610, maxsize=100, currsize=100)
>>> 271 MB
>>> 39.487
>>>
>>>
>>> use_reduce:
>>> CacheInfo(hits=45326, misses=18660, maxsize=None, currsize=18561)
>>> 407 MB
>>> 35.77
>>>
>>> CacheInfo(hits=45186, misses=18800, maxsize=10000, currsize=10000)
>>> 353 MB
>>> 35.52
>>>
>>> CacheInfo(hits=44977, misses=19009, maxsize=5000, currsize=5000)
>>> 335 MB
>>> 35.31
>>>
>>> CacheInfo(hits=44691, misses=19295, maxsize=2500, currsize=2500)
>>> 318 MB
>>> 35.85
>>>
>>> CacheInfo(hits=44178, misses=19808, maxsize=1000, currsize=1000)
>>> 301 MB
>>> 36.39
>>>
>>> CacheInfo(hits=41211, misses=22775, maxsize=100, currsize=100)
>>> 299 MB
>>> 37.175
>>>
>>>
>>> I didn't bother collecting detailed stats for vercmp, since the
>>> inputs/outputs are quite small and don't cause much memory increase.
>>> Please let me know if there are any other suggestions/improvements (and
>>> thanks Sid for the lru_cache suggestion!).
>>>
>>
>> I don't see a patch attached; can you link to it?
>>
>> -A
>>
>>
>>>
>>> Thanks,
>>> Chun-Yu
>>>
>>>
>>>
>>>

Re: [gentoo-portage-dev] [PATCH 1/3] Add caching to catpkgsplit function

Reply via email to