Hi devs,

After having multiple discussions with Jark and Goldfrey, I'd like to start
a discussion on the mailing list w.r.t. FLIP-247[1], which will
significantly improve the performance by providing the bulk fetch
capability for table and column statistics.

Currently the statistics information about tables can only be fetched from
the catalog by each given partition iteratively. Since getting statistics
information from catalogs is a very heavy operation, in order to improve
the query performance, we’d better provide functionality to fetch the
statistics information of a table for all given partitions in one shot.

Based on the manual performance test, for 2000 partitions, the cost will be
improved from 10s to 2s. The improvement result is 500%.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-247%3A+Bulk+fetch+of+table+and+column+statistics+for+given+partitions

Best regards,
Jing

Reply via email to