Thanks for starting this discussion.

Have we considered introducing a listPartitionWithStats() in Catalog?

Best,
Jingsong

On Fri, Jul 15, 2022 at 10:08 AM Jark Wu <imj...@gmail.com> wrote:
>
> Hi Jing,
>
> Thanks for starting this discussion. The bulk fetch is a great improvement
> for the optimizer.
> The FLIP looks good to me.
>
> Best,
> Jark
>
> On Fri, 8 Jul 2022 at 17:36, Jing Ge <j...@ververica.com> wrote:
>
> > Hi devs,
> >
> > After having multiple discussions with Jark and Goldfrey, I'd like to start
> > a discussion on the mailing list w.r.t. FLIP-247[1], which will
> > significantly improve the performance by providing the bulk fetch
> > capability for table and column statistics.
> >
> > Currently the statistics information about tables can only be fetched from
> > the catalog by each given partition iteratively. Since getting statistics
> > information from catalogs is a very heavy operation, in order to improve
> > the query performance, we’d better provide functionality to fetch the
> > statistics information of a table for all given partitions in one shot.
> >
> > Based on the manual performance test, for 2000 partitions, the cost will be
> > improved from 10s to 2s. The improvement result is 500%.
> >
> > [1]
> >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-247%3A+Bulk+fetch+of+table+and+column+statistics+for+given+partitions
> >
> > Best regards,
> > Jing
> >

Reply via email to