godfreyhe commented on a change in pull request #9362: [FLINK-13354] [docs] Add documentation for how to use blink planner URL: https://github.com/apache/flink/pull/9362#discussion_r314586868
########## File path: docs/dev/table/common.md ########## @@ -1304,20 +1414,50 @@ val table: Table = tableEnv.fromDataStream(stream, 'name as 'myName) Query Optimization ------------------ -Apache Flink leverages Apache Calcite to optimize and translate queries. The optimization currently performed include projection and filter push-down, subquery decorrelation, and other kinds of query rewriting. Flink does not yet optimize the order of joins, but executes them in the same order as defined in the query (order of Tables in the `FROM` clause and/or order of join predicates in the `WHERE` clause). +<div class="codetabs" markdown="1"> +<div data-lang="Flink planner" markdown="1"> + +Apache Flink leverages Apache Calcite to optimize and translate queries. The optimization currently performed include projection and filter push-down, subquery decorrelation, and other kinds of query rewriting. Flink planner does not yet optimize the order of joins, but executes them in the same order as defined in the query (order of Tables in the `FROM` clause and/or order of join predicates in the `WHERE` clause). + +It is possible to tweak the set of optimization rules which are applied in different phases by providing a `CalciteConfig` object. This can be created via a builder by calling `CalciteConfig.createBuilder())` and is provided to the TableEnvironment by calling `tableEnv.getConfig.setPlannerConfig(calciteConfig)`. + +</div> + +<div data-lang="Blink planner" markdown="1"> + +The foundation of Apache Flink query optimization is Apache Calcite. In addition to apply Calcite in optimization, Blink planner also does a lot to enhance it. + +Fist of all, Blink planner does a series of rule-based optimization and cost-based optimization including: +* special subquery rewriting, including two part: 1. converts IN and EXISTS into left semi-join 2.converts NOT IN and NOT EXISTS into left anti-join. Note: only IN/EXISTS/NOT IN/NOT EXISTS in conjunctive condition is supported now. +* normal subquery decorrelation based on Apache Calcite +* projection pruning +* filter push down +* partition pruning +* join reorder if it is enabled (`table.optimizer.join-reorder-enabled` is true) +* other kinds of query rewriting + +Secondly, Blink planner introduces rich statistics of data source and propagate those statistics up to the whole plan based on all kinds of extended `MetadataHandler`s. Optimizer could choose better plan based on those metadata. + +Thirdly, Blink planner provides fine-grain cost of each operator, which takes io, cpu, network and memory into account. Cost-based optimization could choose better plan based on fine-grain cost definition. + +Finally, Blink planner will try to find out duplicated sub-plans and reuse them to reduce duplicated computation. + +It is possible to customize optimization programs referencing to `FlinkBatchProgram`(default optimization programs for batch) or `FlinkStreamProgram`(default optimization programs for stream), and replace the default optimization programs by providing a `CalciteConfig` object. This can be created via a builder by calling `CalciteConfig.createBuilder())` and is provided to the TableEnvironment by calling `tableEnv.getConfig.setPlannerConfig(calciteConfig)`. Review comment: i think this part can be simplified as: `It is possible to customize optimization programs through CalciteConfig. Please see the javadocs of CalciteConfig for details` ps, use can not set custom rules through `CalciteConfig` directly, so i do not want to mention it here. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
