Hi Jing, Thanks for starting this discussion. The bulk fetch is a great improvement for the optimizer. The FLIP looks good to me.
Best, Jark On Fri, 8 Jul 2022 at 17:36, Jing Ge <j...@ververica.com> wrote: > Hi devs, > > After having multiple discussions with Jark and Goldfrey, I'd like to start > a discussion on the mailing list w.r.t. FLIP-247[1], which will > significantly improve the performance by providing the bulk fetch > capability for table and column statistics. > > Currently the statistics information about tables can only be fetched from > the catalog by each given partition iteratively. Since getting statistics > information from catalogs is a very heavy operation, in order to improve > the query performance, we’d better provide functionality to fetch the > statistics information of a table for all given partitions in one shot. > > Based on the manual performance test, for 2000 partitions, the cost will be > improved from 10s to 2s. The improvement result is 500%. > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-247%3A+Bulk+fetch+of+table+and+column+statistics+for+given+partitions > > Best regards, > Jing >