JingGe opened a new pull request, #20422:
URL: https://github.com/apache/flink/pull/20422

   
   ## What is the purpose of the change
   
   Currently the statistics information about tables can only be fetched from 
the catalog by each given partition iteratively. Since getting statistics 
information from catalogs is a very heavy operation, in order to improve the 
query performance, we’d better provide functionality to fetch the statistics 
information of a table for all given partitions in one shot.
   
   Based on the manual performance test, for 2000 partitions, the cost will be 
improved from 10s to 2s. The improvement result is 500%.
   
   
   ## Brief change log
   
   - Implementations for new added methods will be done in all classes that 
implement the Catalog interface. 
   - For the concrete HiveCatalog, __HIVE_DEFAULT_PARTITION__ will be taken 
care of while fetching the column statistics. - - All currently supported types 
for partition column will be taken into consideration.
   - The logic of getting TableStates in FlinkRecomputeStatisticsProgram will 
be optimized from calling the catalog iteratively to bulk fetch.
   - Logic of TableStates conversion will be consolidated into the 
CatalogTableStatisticsConverter class to make the domain design a little bit 
cleaner. 
   
   
   ## Verifying this change
   
   This change is already covered by existing tests, such as 
HiveCatalogHiveMetadataTest.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (yes / **no**)
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (**yes** / no)
     - The serializers: (yes / **no** / don't know)
     - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / **no** / don't 
know)
     - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (**yes** / no)
     - If yes, how is the feature documented? (not applicable / **docs** / 
JavaDocs / not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to