Hi devs, After having multiple discussions with Jark and Goldfrey, I'd like to start a discussion on the mailing list w.r.t. FLIP-247[1], which will significantly improve the performance by providing the bulk fetch capability for table and column statistics.
Currently the statistics information about tables can only be fetched from the catalog by each given partition iteratively. Since getting statistics information from catalogs is a very heavy operation, in order to improve the query performance, we’d better provide functionality to fetch the statistics information of a table for all given partitions in one shot. Based on the manual performance test, for 2000 partitions, the cost will be improved from 10s to 2s. The improvement result is 500%. [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-247%3A+Bulk+fetch+of+table+and+column+statistics+for+given+partitions Best regards, Jing