Re: [DISCUSS] FLIP-247 Bulk fetch of table and column statistics for given partitions

Jark Wu Thu, 14 Jul 2022 19:09:34 -0700

Hi Jing,

Thanks for starting this discussion. The bulk fetch is a great improvement
for the optimizer.
The FLIP looks good to me.


Best,
Jark

On Fri, 8 Jul 2022 at 17:36, Jing Ge <[email protected]> wrote:

> Hi devs,
>
> After having multiple discussions with Jark and Goldfrey, I'd like to start
> a discussion on the mailing list w.r.t. FLIP-247[1], which will
> significantly improve the performance by providing the bulk fetch
> capability for table and column statistics.
>
> Currently the statistics information about tables can only be fetched from
> the catalog by each given partition iteratively. Since getting statistics
> information from catalogs is a very heavy operation, in order to improve
> the query performance, we’d better provide functionality to fetch the
> statistics information of a table for all given partitions in one shot.
>
> Based on the manual performance test, for 2000 partitions, the cost will be
> improved from 10s to 2s. The improvement result is 500%.
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-247%3A+Bulk+fetch+of+table+and+column+statistics+for+given+partitions
>
> Best regards,
> Jing
>

Re: [DISCUSS] FLIP-247 Bulk fetch of table and column statistics for given partitions

Reply via email to