Thanks for starting this discussion. Have we considered introducing a listPartitionWithStats() in Catalog?
Best, Jingsong On Fri, Jul 15, 2022 at 10:08 AM Jark Wu <imj...@gmail.com> wrote: > > Hi Jing, > > Thanks for starting this discussion. The bulk fetch is a great improvement > for the optimizer. > The FLIP looks good to me. > > Best, > Jark > > On Fri, 8 Jul 2022 at 17:36, Jing Ge <j...@ververica.com> wrote: > > > Hi devs, > > > > After having multiple discussions with Jark and Goldfrey, I'd like to start > > a discussion on the mailing list w.r.t. FLIP-247[1], which will > > significantly improve the performance by providing the bulk fetch > > capability for table and column statistics. > > > > Currently the statistics information about tables can only be fetched from > > the catalog by each given partition iteratively. Since getting statistics > > information from catalogs is a very heavy operation, in order to improve > > the query performance, we’d better provide functionality to fetch the > > statistics information of a table for all given partitions in one shot. > > > > Based on the manual performance test, for 2000 partitions, the cost will be > > improved from 10s to 2s. The improvement result is 500%. > > > > [1] > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-247%3A+Bulk+fetch+of+table+and+column+statistics+for+given+partitions > > > > Best regards, > > Jing > >