JingGe opened a new pull request, #20422:
URL: https://github.com/apache/flink/pull/20422
## What is the purpose of the change
Currently the statistics information about tables can only be fetched from
the catalog by each given partition iteratively. Since getting statistics
information from catalogs is a very heavy operation, in order to improve the
query performance, we’d better provide functionality to fetch the statistics
information of a table for all given partitions in one shot.
Based on the manual performance test, for 2000 partitions, the cost will be
improved from 10s to 2s. The improvement result is 500%.
## Brief change log
- Implementations for new added methods will be done in all classes that
implement the Catalog interface.
- For the concrete HiveCatalog, __HIVE_DEFAULT_PARTITION__ will be taken
care of while fetching the column statistics. - - All currently supported types
for partition column will be taken into consideration.
- The logic of getting TableStates in FlinkRecomputeStatisticsProgram will
be optimized from calling the catalog iteratively to bulk fetch.
- Logic of TableStates conversion will be consolidated into the
CatalogTableStatisticsConverter class to make the domain design a little bit
cleaner.
## Verifying this change
This change is already covered by existing tests, such as
HiveCatalogHiveMetadataTest.
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): (yes / **no**)
- The public API, i.e., is any changed class annotated with
`@Public(Evolving)`: (**yes** / no)
- The serializers: (yes / **no** / don't know)
- The runtime per-record code paths (performance sensitive): (yes / **no**
/ don't know)
- Anything that affects deployment or recovery: JobManager (and its
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / **no** / don't
know)
- The S3 file system connector: (yes / **no** / don't know)
## Documentation
- Does this pull request introduce a new feature? (**yes** / no)
- If yes, how is the feature documented? (not applicable / **docs** /
JavaDocs / not documented)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]