On Fri, 1 Mar 2024, 04:55 Corey Huinker, <corey.huin...@gmail.com> wrote:
>> Also per our prior discussion- this makes sense to include in post-data 
>> section, imv, and also because then we have the indexes we may wish to load 
>> stats for, but further that also means it’ll be in the paralleliziable part 
>> of the process, making me a bit less concerned overall about the individual 
>> timing.
>
>
> The ability to parallelize is pretty persuasive. But is that per-statement 
> parallelization or do we get transaction blocks? i.e. if we ended up 
> importing stats like this:
>
> BEGIN;
> LOCK TABLE schema.relation IN SHARE UPDATE EXCLUSIVE MODE;
> LOCK TABLE pg_catalog.pg_statistic IN ROW UPDATE EXCLUSIVE MODE;
> SELECT pg_import_rel_stats('schema.relation', ntuples, npages);
> SELECT pg_import_pg_statistic('schema.relation', 'id', ...);
> SELECT pg_import_pg_statistic('schema.relation', 'name', ...);

How well would this simplify to the following:

SELECT pg_import_statistic('schema.relation', attname, ...)
FROM (VALUES ('id', ...), ...) AS relation_stats (attname, ...);

Or even just one VALUES for the whole statistics loading?

I suspect the main issue with combining this into one statement
(transaction) is that failure to load one column's statistics implies
you'll have to redo all the other statistics (or fail to load the
statistics at all), which may be problematic at the scale of thousands
of relations with tens of columns each.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)


Reply via email to