zabetak opened a new pull request, #5249: URL: https://github.com/apache/hive/pull/5249
### What changes were proposed in this pull request? 1. Add `applyCteRewriting` phase in `CalcitePlanner` for detecting and using CTEs; ensure rewrite logic is consistent with existing `hive.optimize.cte.materialize.*` properties. 2. Model CTEs as materialized views (MVs) and add utility method in `HiveMaterializedViewUtils` for mapping a CTE to a `RelOptMaterialization`. 3. Refactor core MV rewrite logic in `CalcitePlanner` to use during CTE rewrite and exploit CTEs in a cost-based manner. 4. Add `HiveTableSpool` operator to represent CTEs and handle them in the plan using new rules: `TableScanToSpoolRule` and `RemoveUnusedCteRule`. 5. Add `TableScanToSpoolRule`, and `RemoveUnusedCteRule` to add/remove spools from the plan. 6. Enhance/Enrich metadata handlers for handling the Spool operator. 7. Add `AggregatedColumns` metadata (and respective handler and metadata query), for controlling if a CTE is a "full aggregate" at the CBO (RelNode) level to ensure consistent behavior with `hive.optimize.cte.materialize.full.aggregate.only` property. 8. Add `HiveSqlTypeUtil.containsSqlType` for detecting and skipping the creation of CTEs with untyped nulls since they are not supported (HIVE-11217). 9. Add `hive.optimize.cte.suggester.class` and CommonTableExpressionSuggester interface to provide pluggable CTE detection logic. Given that CTE detection logic can range from basic tree traversal algorithms to complex workload analysis frameworks this part needs to be configurable since there is no one-size-fits-all implementation. The configuration property also allows proprietary algorithms to be integrated in HiveServer2 by implementing the necessary APIs and adding the jars in the classpath. 10. Add prototype implementation for CTE detection logic in CommonTableExpressionIdentitySuggester using CommonRelSubExprRegisterRule and CommonTableExpressionRegistry. Although the implementation is rather simple it can discover various interesting CTEs as demonstrated by the tests and can be indeed useful in a prod environment. 11. Map spool(s) to `WITH` clauses during the RelNode to AST conversion (ASTConverter, ASTBuilder, PlanModifierForASTConv) to exploit existing CTE materialization feature (HIVE-11752). 12. Modify (slighly) `SemanticAnalyzer`/`CalcitePlanner` to enable AST-based CTE materialization (getMetadata) post CBO run. ### Why are the changes needed? * Open the road for cost-based CTE optimizations in Hive * Pluggable/Extensible CTE detection logic ### Does this PR introduce _any_ user-facing change? By default no. After tuning the new/existing cte properties query plans may change affecting performance. ### Is the change a dependency upgrade? No ### How was this patch tested? Tests for: * CTE detection logic using the `CommonTableExpressionPrintSuggester` (`TestTezTPCDS30TBPerfCliDriver`) * demonstrating (end-to-end) the CTE feature (cte_cbo_rewrite_0.q) * verify coherence of CTE rewrite with `hive.optimize.cte.materialize.full.aggregate.only` (cte_mat_12.q) * spool JSON serialization (cte_cbo_plan_json.q) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org