Le 24/04/2024 à 21:10, Melanie Plageman a écrit :
On Wed, Apr 24, 2024 at 8:08 AM Frédéric Yhuel
<frederic.yh...@dalibo.com> wrote:

Hello,

I would like to suggest a new parameter, autovacuum_max_threshold, which
would set an upper limit on the number of tuples to delete/update/insert
prior to vacuum/analyze.

Hi Frédéric, thanks for the proposal! You are tackling a very tough
problem. I would also find it useful to know more about what led you
to suggest this particular solution. I am very interested in user
stories around difficulties with what tables are autovacuumed and
when.


Hi Melanie! I can certainly start compiling user stories about that.

Recently, one of my colleagues wrote an email to our DBA team saying something along these lines:

« Hey, here is our suggested settings for per table autovacuum configuration:

| *autovacuum* | L < 1 million | L >= 1 million | L >= 5 millions | L >= 10 millions |
|:---------------------|--------------:|---------------:|----------------:|-----------------:|
|`vacuum_scale_factor` | 0.2 (défaut) | 0.1 | 0.05 | 0.0 | |`vacuum_threshold` | 50 (défaut) | 50 (défaut) | 50 (défaut) | 500 000 | |`analyze_scale_factor`| 0.1 (défaut) | 0.1 (défaut) | 0.05 | 0.0 | |`analyze_threshold` | 50 (défaut) | 50 (défaut) | 50 (défaut) | 500 000 |

Let's update this table with values for the vacuum_insert_* parameters. »

I wasn't aware that we had this table, and although the settings made sense to me, I thought it was rather ugly and cumbersome for the user, and I started thinking about how postgres could make his life easier.

Am I correct in thinking that one of the major goals here is for a
very large table to be more likely to be vacuumed?


Absolutely.

The idea would be to replace the following calculation :

vacthresh = (float4) vac_base_thresh + vac_scale_factor * reltuples;

with this one :

vacthresh = (float4) vac_base_thresh + vac_scale_factor * reltuples / (1
+ vac_scale_factor * reltuples / autovacuum_max_threshold)

(and the same for the others, vacinsthresh and anlthresh).

My first thought when reviewing the GUC and how it is used is
wondering if its description is a bit misleading.

autovacuum_vacuum_threshold is the "minimum number of updated or
deleted tuples needed to trigger a vacuum". That is, if this many
tuples are modified, it *may* trigger a vacuum, but we also may skip
vacuuming the table for other reasons or due to other factors.
autovacuum_max_threshold's proposed definition is the upper
limit/maximum number of tuples to insert/update/delete prior to
vacuum/analyze. This implies that if that many tuples have been
modified or inserted, the table will definitely be vacuumed -- which
isn't true. Maybe that is okay, but I thought I would bring it up.


I'm not too sure I understand. What are the reasons it might by skipped? I can think of a concurrent index creation on the same table, or anything holding a SHARE UPDATE EXCLUSIVE lock or above. Is this the sort of thing you are talking about?

Perhaps a better name for the GUC would be autovacuum_asymptotic_limit... or something like that?

The attached (draft) patch further illustrates the idea.

Thanks for including a patch!

My guess is that a similar proposal has already been submitted... and
rejected 🙂 If so, I'm very sorry for the useless noise.

I rooted around in the hackers archive and couldn't find any threads
on this specific proposal. I copied some other hackers I knew of who
have worked on this problem and thought about it in the past, in case
they know of some existing threads or prior work on this specific
topic.


Thanks!


Reply via email to