Hi David,
On Sat, Jan 08, 2022 at 07:30:24PM +0000, David CARLIER wrote:
> Hi
>
> Here a proposal to monitor memory pressure level on macOs to then
> trigger pools trimming when it is critical.
>
> Cheers.
>
> thanks.
>
>
> From 6b93fc00168a4e6ff80609ceb64582fea8d96ca0 Mon Sep 17 00:00:00 2001
> From: David CARLIER <[email protected]>
> Date: Sat, 8 Jan 2022 19:25:18 +0000
> Subject: [PATCH] MEDIUM: pool catching memory pressure level from the system
> on macOs.
>
> proposal to provide an additional trigger to relief the pressure on the
> pools on macOs, if HAProxy is under critical memory pressure via the
> dispatch API.
For this one I have a few concerns:
- on the build side, it uses some LLVM/Clang extensions ("blocks" API),
the test only uses defined(__BLOCKS__) which is possibly OK on modern
systems but I'd rather make sure we limit this test to LLVM only so
as to make sure that it's not inherited from something totally different
by accident;
- on the build side again, my readings about the blocks API (which I
never heard about before) indicates that one has to pass -fblocks to
benefit from it, otherwise it's not used. Hence my understanding is
that this block remains excluded from the build.
- on the maintenance side, I feel a bit concerned by the introduction
of exotic language extensions. Having to go through such an unusual
syntax just to call a function instead of passing a function pointer
looks inefficient at best (as it emits a dummy function that calls
the first one), and less maintainable, so as much as possible I'd
rather avoid this and just use a standard callback.
- on the efficiency side, I'm a bit embarrassed. What do we define as
"critical" here ? How will users adjust the thresholds (if any) ?
How do we know that the preset thresholds will not cause extra
latencies by releasing memory too often for example ? Prior to 2.4
depending on the build models, we used to call pool_gc() when facing
an allocation error, before trying again. Nowadays we do something
smarter, we monitor the average usage and spikes in each pool and
automatically release the cold areas, meaning that overall we use
less memory on varying loads, and are even less likely to salvage
extra memory if/when reaching a level considered critical.
- last, we've faced deadlocks in the past between some pool_alloc()
and other blocks due to them being performed under thread isolation,
and here it makes me think that we could reintroduce random indirect
calls to thread_isolate() while in the process of allocating areas,
thus reintroduce the risk of potential deadlocks (i.e. when another
thread waits on thread_isolate and it cannot progress because it
waits on a lock we still hold).
So I fear that there are more trouble to expect mid-term than benefits
to win. I don't know if you have metrics which show significant benefits
in using that that outweigh the issues above (especially the last one is
really not trivial to overcome), but for now I'd rather not include such
a change.
Thanks,
Willy