>
>
>
> +\gexec
>
> Why do we need to construct the command and execute? Can we instead
> execute the function directly? That would also avoid ECHO magic.
>

We don't strictly need it, but I've found the set-difference operation to
be incredibly useful in diagnosing problems. Additionally, the values are
subject to change due to changes in test data, no guarantee that the output
of ANALYZE is deterministic, etc. But most of all, because the test cares
about the correct copying of values, not the values themselves.


>
> +   <table id="functions-admin-statsimport">
> +    <title>Database Object Statistics Import Functions</title>
> +    <tgroup cols="1">
> +     <thead>
> +      <row>
> +       <entry role="func_table_entry"><para role="func_signature">
> +        Function
> +       </para>
> +       <para>
> +        Description
> +       </para></entry>
> +      </row>
> +     </thead>
>
> COMMENT: The functions throw many validation errors. Do we want to list
> the acceptable/unacceptable input values in the documentation corresponding
> to those? I don't expect one line per argument validation. Something like
> "these, these and these arguments can not be NULL" or "both arguments in
> each of the pairs x and y, a and b, and c and d should be non-NULL or NULL
> respectively".
>

Yes. It should.


>  Statistics are about data. Whenever pg_dump dumps some filtered data, the
> statistics collected for the whole table are uselss. We should avoide
> dumping
> statistics in such a case. E.g. when only schema is dumped what good is
> statistics? Similarly the statistics on a partitioned table may not be
> useful
> if some its partitions are not dumped. Said that dumping statistics on
> foreign
> table makes sense since they do not contain data but the statistics still
> makes sense.
>

Good points, but I'm not immediately sure how to enforce those rules.


>
>
>>
>> Key areas where I'm seeking feedback:
>>
>> - What level of errors in a restore will a user tolerate, and what should
>> be done to the error messages to indicate that the data itself is fine, but
>> a manual operation to update stats on that particular table is now
>> warranted?
>> - To what degree could pg_restore/pg_upgrade take that recovery action
>> automatically?
>> - Should the individual attribute/class set function calls be grouped by
>> relation, so that they all succeed/fail together, or should they be called
>> separately, each able to succeed or fail on their own?
>> - Any other concerns about how to best use these new functions.
>>
>>
>>
> Whether or not I pass --no-statistics, there is no difference in the dump
> output. Am I missing something?
> $ pg_dump -d postgres > /tmp/dump_no_arguments.out
> $ pg_dump -d postgres --no-statistics > /tmp/dump_no_statistics.out
> $ diff /tmp/dump_no_arguments.out /tmp/dump_no_statistics.out
> $
>
> IIUC, pg_dump includes statistics by default. That means all our pg_dump
> related tests will have statistics output by default. That's good since the
> functionality will always be tested. 1. We need additional tests to ensure
> that the statistics is installed after restore. 2. Some of those tests
> compare dumps before and after restore. In case the statistics is changed
> because of auto-analyze happening post-restore, these tests will fail.
>

+1


> I believe, in order to import statistics through IMPORT FOREIGN SCHEMA,
> postgresImportForeignSchema() will need to add SELECT commands invoking
> pg_set_relation_stats() on each imported table and pg_set_attribute_stats()
> on each of its attribute. Am I right? Do we want to make that happen in the
> first cut of the feature? How do you expect these functions to be used to
> update statistics of foreign tables?
>

I don't think there's time to get it into this release. I think we'd want
to extend this functionality to both IMPORT FOREIGN SCHEMA and ANALYZE for
foreign tables, in both cases with a server/table option to do regular
remote sampling. In both cases, they'd do a remote query very similar to
what pg_dump does (hence putting it in fe_utils), with some filters on
which columns/tables it believes it can trust. The remote table might
itself be a view (in which case they query would turn up nothing) or column
data types may change across the wire, and in those cases we'd have to fall
back to sampling.

Reply via email to