jialiangCHOU opened a new issue, #9959: URL: https://github.com/apache/incubator-doris/issues/9959
## Background: Apache Doris only stores ndv for column statistics.. In the selection rate of inference predicates, especially when the data distribution is uneven, the error is large. Therefore, it is necessary to introduce the statistical information of column contour histogram for more accurate cost evaluation, so that the optimizer can infer the real best plan. ## Requirements: 1. use the existing statistical information collection framework to collect the statistical data of column contour histogram (only support manual trigger Collection). The collection process should not affect the normal task execution of the cluster. 2. transform the existing statistical information storage framework to support the storage of column contour histogram statistical data. 3. query column contour histogram data is supported. 4. the logic of deriving predicate selection rate based on column contour histogram is realized, and it is correctly used by the statistical information derivation framework. ## Project output requirements: With the help of column contour histogram, a more accurate predicate selection rate is derived to help the optimizer derive a better query plan. Finally, it helps Apache Doris improve query performance under the TPC-H benchmark. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
